MyArxiv
Robotics
FlySearch: Exploring how vision-language models explore
The real world is messy and unstructured. Uncovering critical information often requires active, goal-driven exploration. It remains to be seen whether Vision-Language Models (VLMs), which recently emerged as a popular zero-shot tool in many difficult tasks, can operate effectively in such conditions. In this paper, we answer this question by introducing FlySearch, a 3D, outdoor, photorealistic environment for searching and navigating to objects in complex scenes. We define three sets of scenarios with varying difficulty and observe that state-of-the-art VLMs cannot reliably solve even the simplest exploration tasks, with the gap to human performance increasing as the tasks get harder. We identify a set of central causes, ranging from vision hallucination, through context misunderstanding, to task planning failures, and we show that some of them can be addressed by finetuning. We publicly release the benchmark, scenarios, and the underlying codebase.
Multi Layered Autonomy and AI Ecologies in Robotic Art Installations
Symbiosis of Agents is a large-scale installation by Baoyang Chen (baoyangchen.com) that embeds AI-driven robots in an immersive, mirror-lined arena, probing the tension between machine agency and artistic authorship. Drawing on early cybernetics, rule-based conceptual art, and seminal robotic works, it orchestrates fluid exchanges among robotic arms, quadruped machines, their environment, and the public. A three tier faith system pilots the ecology: micro-level adaptive tactics, meso-level narrative drives, and a macro-level prime directive. This hierarchy lets behaviors evolve organically in response to environmental cues and even a viewer's breath, turning spectators into co-authors of the unfolding drama. Framed by a speculative terraforming scenario that recalls the historical exploitation of marginalized labor, the piece asks who bears responsibility in AI-mediated futures. Choreographed motion, AI-generated scripts, reactive lighting, and drifting fog cast the robots as collaborators rather than tools, forging a living, emergent artwork. Exhibited internationally, Symbiosis of Agents shows how cybernetic feedback, robotic experimentation, and conceptual rule-making can converge to redefine agency, authorship, and ethics in contemporary art.
Systems and Control (CS)
React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN
We study parameterizations of stabilizing nonlinear policies for learning-based control. We propose a structure based on a nonlinear version of the Youla-Kucera parameterization combined with robust neural networks such as the recurrent equilibrium network (REN). The resulting parameterizations are unconstrained, and hence can be searched over with first-order optimization methods, while always ensuring closed-loop stability by construction. We study the combination of (a) nonlinear dynamics, (b) partial observation, and (c) incremental closed-loop stability requirements (contraction and Lipschitzness). We find that with any two of these three difficulties, a contracting and Lipschitz Youla parameter always leads to contracting and Lipschitz closed loops. However, if all three hold, then incremental stability can be lost with exogenous disturbances. Instead, a weaker condition is maintained, which we call d-tube contraction and Lipschitzness. We further obtain converse results showing that the proposed parameterization covers all contracting and Lipschitz closed loops for certain classes of nonlinear systems. Numerical experiments illustrate the utility of our parameterization when learning controllers with built-in stability certificates for: (i) "economic" rewards without stabilizing effects; (ii) short training horizons; and (iii) uncertain systems.
Systems and Control (EESS)
React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN
We study parameterizations of stabilizing nonlinear policies for learning-based control. We propose a structure based on a nonlinear version of the Youla-Kucera parameterization combined with robust neural networks such as the recurrent equilibrium network (REN). The resulting parameterizations are unconstrained, and hence can be searched over with first-order optimization methods, while always ensuring closed-loop stability by construction. We study the combination of (a) nonlinear dynamics, (b) partial observation, and (c) incremental closed-loop stability requirements (contraction and Lipschitzness). We find that with any two of these three difficulties, a contracting and Lipschitz Youla parameter always leads to contracting and Lipschitz closed loops. However, if all three hold, then incremental stability can be lost with exogenous disturbances. Instead, a weaker condition is maintained, which we call d-tube contraction and Lipschitzness. We further obtain converse results showing that the proposed parameterization covers all contracting and Lipschitz closed loops for certain classes of nonlinear systems. Numerical experiments illustrate the utility of our parameterization when learning controllers with built-in stability certificates for: (i) "economic" rewards without stabilizing effects; (ii) short training horizons; and (iii) uncertain systems.
Robotics
EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment
Deep reinforcement learning agents are often fragile while humans remain adaptive and flexible to varying scenarios. To bridge this gap, we present EDEN, a biologically inspired navigation framework that integrates learned entorhinal-like grid cell representations and reinforcement learning to enable autonomous navigation. Inspired by the mammalian entorhinal-hippocampal system, EDEN allows agents to perform path integration and vector-based navigation using visual and motion sensor data. At the core of EDEN is a grid cell encoder that transforms egocentric motion into periodic spatial codes, producing low-dimensional, interpretable embeddings of position. To generate these activations from raw sensory input, we combine fiducial marker detections in the lightweight MiniWorld simulator and DINO-based visual features in the high-fidelity Gazebo simulator. These spatial representations serve as input to a policy trained with Proximal Policy Optimization (PPO), enabling dynamic, goal-directed navigation. We evaluate EDEN in both MiniWorld, for rapid prototyping, and Gazebo, which offers realistic physics and perception noise. Compared to baseline agents using raw state inputs (e.g., position, velocity) or standard convolutional image encoders, EDEN achieves a 99% success rate, within the simple scenarios, and >94% within complex floorplans with occluded paths with more efficient and reliable step-wise navigation. In addition, as a replacement of ground truth activations, we present a trainable Grid Cell encoder enabling the development of periodic grid-like patterns from vision and motion sensor data, emulating the development of such patterns within biological mammals. This work represents a step toward biologically grounded spatial intelligence in robotics, bridging neural navigation principles with reinforcement learning for scalable deployment.
Adjusting Tissue Puncture Omnidirectionally In Situ with Pneumatic Rotatable Biopsy Mechanism and Hierarchical Airflow Management in Tortuous Luminal Pathways
In situ tissue biopsy with an endoluminal catheter is an efficient approach for disease diagnosis, featuring low invasiveness and few complications. However, the endoluminal catheter struggles to adjust the biopsy direction by distal endoscope bending or proximal twisting for tissue sampling within the tortuous luminal organs, due to friction-induced hysteresis and narrow spaces. Here, we propose a pneumatically-driven robotic catheter enabling the adjustment of the sampling direction without twisting the catheter for an accurate in situ omnidirectional biopsy. The distal end of the robotic catheter consists of a pneumatic bending actuator for the catheter's deployment in torturous luminal organs and a pneumatic rotatable biopsy mechanism (PRBM). By hierarchical airflow control, the PRBM can adjust the biopsy direction under low airflow and deploy the biopsy needle with higher airflow, allowing for rapid omnidirectional sampling of tissue in situ. This paper describes the design, modeling, and characterization of the proposed robotic catheter, including repeated deployment assessments of the biopsy needle, puncture force measurement, and validation via phantom tests. The PRBM prototype has six sampling directions evenly distributed across 360 degrees when actuated by a positive pressure of 0.3 MPa. The pneumatically-driven robotic catheter provides a novel biopsy strategy, potentially facilitating in situ multidirectional biopsies in tortuous luminal organs with minimum invasiveness.
UniConFlow: A Unified Constrained Generalization Framework for Certified Motion Planning with Flow Matching Models
Generative models have become increasingly powerful tools for robot motion generation, enabling flexible and multimodal trajectory generation across various tasks. Yet, most existing approaches remain limited in handling multiple types of constraints, such as collision avoidance and dynamic consistency, which are often treated separately or only partially considered. This paper proposes UniConFlow, a unified flow matching (FM) based framework for trajectory generation that systematically incorporates both equality and inequality constraints. UniConFlow introduces a novel prescribed-time zeroing function to enhance flexibility during the inference process, allowing the model to adapt to varying task requirements. To ensure constraint satisfaction, particularly with respect to obstacle avoidance, admissible action range, and kinodynamic consistency, the guidance inputs to the FM model are derived through a quadratic programming formulation, which enables constraint-aware generation without requiring retraining or auxiliary controllers. We conduct mobile navigation and high-dimensional manipulation tasks, demonstrating improved safety and feasibility compared to state-of-the-art constrained generative planners. Project page is available at https://uniconflow.github.io.
Online Performance Assessment of Multi-Source-Localization for Autonomous Driving Systems Using Subjective Logic
Autonomous driving (AD) relies heavily on high precision localization as a crucial part of all driving related software components. The precise positioning is necessary for the utilization of high-definition maps, prediction of other road participants and the controlling of the vehicle itself. Due to this reason, the localization is absolutely safety relevant. Typical errors of the localization systems, which are long term drifts, jumps and false localization, that must be detected to enhance safety. An online assessment and evaluation of the current localization performance is a challenging task, which is usually done by Kalman filtering for single localization systems. Current autonomous vehicles cope with these challenges by fusing multiple individual localization methods into an overall state estimation. Such approaches need expert knowledge for a competitive performance in challenging environments. This expert knowledge is based on the trust and the prioritization of distinct localization methods in respect to the current situation and environment. This work presents a novel online performance assessment technique of multiple localization systems by using subjective logic (SL). In our research vehicles, three different systems for localization are available, namely odometry-, Simultaneous Localization And Mapping (SLAM)- and Global Navigation Satellite System (GNSS)-based. Our performance assessment models the behavior of these three localization systems individually and puts them into reference of each other. The experiments were carried out using the CoCar NextGen, which is based on an Audi A6. The vehicle's localization system was evaluated under challenging conditions, specifically within a tunnel environment. The overall evaluation shows the feasibility of our approach.
comment: submitted to IEEE IAVVC 2025
Functionality Assessment Framework for Autonomous Driving Systems using Subjective Networks SC 2025
In complex autonomous driving (AD) software systems, the functioning of each system part is crucial for safe operation. By measuring the current functionality or operability of individual components an isolated glimpse into the system is given. Literature provides several of these detached assessments, often in the form of safety or performance measures. But dependencies, redundancies, error propagation and conflicting functionality statements do not allow for easy combination of these measures into a big picture of the functioning of the entire AD stack. Data is processed and exchanged between different components, each of which can fail, making an overall statement challenging. The lack of functionality assessment frameworks that tackle these problems underlines this complexity. This article presents a novel framework for inferring an overall functionality statement for complex component based systems by considering their dependencies, redundancies, error propagation paths and the assessments of individual components. Our framework first incorporates a comprehensive conversion to an assessment representation of the system. The representation is based on Subjective Networks (SNs) that allow for easy identification of faulty system parts. Second, the framework offers a flexible method for computing the system's functionality while dealing with contradicting assessments about the same component and dependencies, as well as redundancies, of the system. We discuss the framework's capabilities on real-life data of our AD stack with assessments of various components.
comment: submitted to IEEE ITSC 2025
Text-guided Generation of Efficient Personalized Inspection Plans
We propose a training-free, Vision-Language Model (VLM)-guided approach for efficiently generating trajectories to facilitate target inspection planning based on text descriptions. Unlike existing Vision-and-Language Navigation (VLN) methods designed for general agents in unknown environments, our approach specifically targets the efficient inspection of known scenes, with widespread applications in fields such as medical, marine, and civil engineering. Leveraging VLMs, our method first extracts points of interest (POIs) from the text description, then identifies a set of waypoints from which POIs are both salient and align with the spatial constraints defined in the prompt. Next, we interact with the VLM to iteratively refine the trajectory, preserving the visibility and prominence of the POIs. Further, we solve a Traveling Salesman Problem (TSP) to find the most efficient visitation order that satisfies the order constraint implied in the text description. Finally, we apply trajectory optimization to generate smooth, executable inspection paths for aerial and underwater vehicles. We have evaluated our method across a series of both handcrafted and real-world scanned environments. The results demonstrate that our approach effectively generates inspection planning trajectories that adhere to user instructions.
comment: 8 pages, 5 figures
FlySearch: Exploring how vision-language models explore
The real world is messy and unstructured. Uncovering critical information often requires active, goal-driven exploration. It remains to be seen whether Vision-Language Models (VLMs), which recently emerged as a popular zero-shot tool in many difficult tasks, can operate effectively in such conditions. In this paper, we answer this question by introducing FlySearch, a 3D, outdoor, photorealistic environment for searching and navigating to objects in complex scenes. We define three sets of scenarios with varying difficulty and observe that state-of-the-art VLMs cannot reliably solve even the simplest exploration tasks, with the gap to human performance increasing as the tasks get harder. We identify a set of central causes, ranging from vision hallucination, through context misunderstanding, to task planning failures, and we show that some of them can be addressed by finetuning. We publicly release the benchmark, scenarios, and the underlying codebase.
Automatic Operation of an Articulated Dump Truck: State Estimation by Combined QZSS CLAS and Moving-Base RTK Using Multiple GNSS Receivers
Labor shortage due to the declining birth rate has become a serious problem in the construction industry, and automation of construction work is attracting attention as a solution to this problem. This paper proposes a method to realize state estimation of dump truck position, orientation and articulation angle using multiple GNSS for automatic operation of dump trucks. RTK-GNSS is commonly used for automation of construction equipment, but in mountainous areas, mobile networks often unstable, and RTK-GNSS using GNSS reference stations cannot be used. Therefore, this paper develops a state estimation method for dump trucks that does not require a GNSS reference station by using the Centimeter Level Augmentation Service (CLAS) of the Japanese Quasi-Zenith Satellite System (QZSS). Although CLAS is capable of centimeter-level position estimation, its positioning accuracy and ambiguity fix rate are lower than those of RTK-GNSS. To solve this problem, we construct a state estimation method by factor graph optimization that combines CLAS positioning and moving-base RTK-GNSS between multiple GNSS antennas. Evaluation tests under real-world environments have shown that the proposed method can estimate the state of dump trucks with the same accuracy as conventional RTK-GNSS, but does not require a GNSS reference station.
comment: Accepted to the ION 2024 Pacific PNT Meeting
Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs
Task planning under uncertainty is essential for home-service robots operating in the real world. Tasks involve ambiguous human instructions, hidden or unknown object locations, and open-vocabulary object types, leading to significant open-ended uncertainty and a boundlessly large planning space. To address these challenges, we propose Tru-POMDP, a planner that combines structured belief generation using Large Language Models (LLMs) with principled POMDP planning. Tru-POMDP introduces a hierarchical Tree of Hypotheses (TOH), which systematically queries an LLM to construct high-quality particle beliefs over possible world states and human goals. We further formulate an open-ended POMDP model that enables rigorous Bayesian belief tracking and efficient belief-space planning over these LLM-generated hypotheses. Experiments on complex object rearrangement tasks across diverse kitchen environments show that Tru-POMDP significantly outperforms state-of-the-art LLM-based and LLM-tree-search hybrid planners, achieving higher success rates with significantly better plans, stronger robustness to ambiguity and occlusion, and greater planning efficiency.
Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games
The increasing proliferation of small UAVs in civilian and military airspace has raised critical safety and security concerns, especially when unauthorized or malicious drones enter restricted zones. In this work, we present a reinforcement learning (RL) framework for agile 1v1 quadrotor pursuit-evasion. We train neural network policies to command body rates and collective thrust, enabling high-speed pursuit and evasive maneuvers that fully exploit the quadrotor's nonlinear dynamics. To mitigate nonstationarity and catastrophic forgetting during adversarial co-training, we introduce an Asynchronous Multi-Stage Population-Based (AMSPB) algorithm where, at each stage, either the pursuer or evader learns against a sampled opponent drawn from a growing population of past and current policies. This continual learning setup ensures monotonic performance improvement and retention of earlier strategies. Our results show that (i) rate-based policies achieve significantly higher capture rates and peak speeds than velocity-level baselines, and (ii) AMSPB yields stable, monotonic gains against a suite of benchmark opponents.
High-speed control and navigation for quadrupedal robots on complex and discrete terrain
High-speed legged navigation in discrete and geometrically complex environments is a challenging task because of the high-degree-of-freedom dynamics and long-horizon, nonconvex nature of the optimization problem. In this work, we propose a hierarchical navigation pipeline for legged robots that can traverse such environments at high speed. The proposed pipeline consists of a planner and tracker module. The planner module finds physically feasible foothold plans by sampling-based optimization with fast sequential filtering using heuristics and a neural network. Subsequently, rollouts are performed in a physics simulation to identify the best foothold plan regarding the engineered cost function and to confirm its physical consistency. This hierarchical planning module is computationally efficient and physically accurate at the same time. The tracker aims to accurately step on the target footholds from the planning module. During the training stage, the foothold target distribution is given by a generative model that is trained competitively with the tracker. This process ensures that the tracker is trained in an environment with the desired difficulty. The resulting tracker can overcome terrains that are more difficult than what the previous methods could manage. We demonstrated our approach using Raibo, our in-house dynamic quadruped robot. The results were dynamic and agile motions: Raibo is capable of running on vertical walls, jumping a 1.3-meter gap, running over stepping stones at 4 meters per second, and autonomously navigating on terrains full of 30{\deg} ramps, stairs, and boxes of various sizes.
Efficient Tactile Perception with Soft Electrical Impedance Tomography and Pre-trained Transformer
Tactile sensing is fundamental to robotic systems, enabling interactions through physical contact in multiple tasks. Despite its importance, achieving high-resolution, large-area tactile sensing remains challenging. Electrical Impedance Tomography (EIT) has emerged as a promising approach for large-area, distributed tactile sensing with minimal electrode requirements which can lend itself to addressing complex contact problems in robotics. However, existing EIT-based tactile reconstruction methods often suffer from high computational costs or depend on extensive annotated simulation datasets, hindering its viability in real-world settings. To address this shortcoming, here we propose a Pre-trained Transformer for EIT-based Tactile Reconstruction (PTET), a learning-based framework that bridges the simulation-to-reality gap by leveraging self-supervised pretraining on simulation data and fine-tuning with limited real-world data. In simulations, PTET requires 99.44 percent fewer annotated samples than equivalent state-of-the-art approaches (2,500 vs. 450,000 samples) while achieving reconstruction performance improvements of up to 43.57 percent under identical data conditions. Fine-tuning with real-world data further enables PTET to overcome discrepancies between simulated and experimental datasets, achieving superior reconstruction and detail recovery in practical scenarios. The improved reconstruction accuracy, data efficiency, and robustness in real-world tasks establish it as a scalable and practical solution for tactile sensing systems in robotics, especially for object handling and adaptive grasping under varying pressure conditions.
Optimization of Robotic Liquid Handling as a Capacitated Vehicle Routing Problem
We present an optimization strategy to reduce the execution time of liquid handling operations in the context of an automated chemical laboratory. By formulating the task as a capacitated vehicle routing problem (CVRP), we leverage heuristic solvers traditionally used in logistics and transportation planning to optimize task execution times. As exemplified using an 8-channel pipette with individually controllable tips, our approach demonstrates robust optimization performance across different labware formats (e.g., well-plates, vial holders), achieving up to a 37% reduction in execution time for randomly generated tasks compared to the baseline sorting method. We further apply the method to a real-world high-throughput materials discovery campaign and observe that 3 minutes of optimization time led to a reduction of 61 minutes in execution time compared to the best-performing sorting-based strategy. Our results highlight the potential for substantial improvements in throughput and efficiency in automated laboratories without any hardware modifications. This optimization strategy offers a practical and scalable solution to accelerate combinatorial experimentation in areas such as drug combination screening, reaction condition optimization, materials development, and formulation engineering.
Geometric Visual Servo Via Optimal Transport
When developing control laws for robotic systems, the principle factor when examining their performance is choosing inputs that allow smooth tracking to a reference input. In the context of robotic manipulation, this involves translating an object or end-effector from an initial pose to a target pose. Robotic manipulation control laws frequently use vision systems as an error generator to track features and produce control inputs. However, current control algorithms don't take into account the probabilistic features that are extracted and instead rely on hand-tuned feature extraction methods. Furthermore, the target features can exist in a static pose thus allowing a combined pose and feature error for control generation. We present a geometric control law for the visual servoing problem for robotic manipulators. The input from the camera constitutes a probability measure on the 3-dimensional Special Euclidean task-space group, where the Wasserstein distance between the current and desired poses is analogous with the geometric geodesic. From this, we develop a controller that allows for both pose and image-based visual servoing by combining classical PD control with gravity compensation with error minimization through the use of geodesic flows on a 3-dimensional Special Euclidean group. We present our results on a set of test cases demonstrating the generalisation ability of our approach to a variety of initial positions.
comment: 19 pages, 5 figures
Accelerating Model-Based Reinforcement Learning using Non-Linear Trajectory Optimization
This paper addresses the slow policy optimization convergence of Monte Carlo Probabilistic Inference for Learning Control (MC-PILCO), a state-of-the-art model-based reinforcement learning (MBRL) algorithm, by integrating it with iterative Linear Quadratic Regulator (iLQR), a fast trajectory optimization method suitable for nonlinear systems. The proposed method, Exploration-Boosted MC-PILCO (EB-MC-PILCO), leverages iLQR to generate informative, exploratory trajectories and initialize the policy, significantly reducing the number of required optimization steps. Experiments on the cart-pole task demonstrate that EB-MC-PILCO accelerates convergence compared to standard MC-PILCO, achieving up to $\bm{45.9\%}$ reduction in execution time when both methods solve the task in four trials. EB-MC-PILCO also maintains a $\bm{100\%}$ success rate across trials while solving the task faster, even in cases where MC-PILCO converges in fewer iterations.
Solving the Pod Repositioning Problem with Deep Reinforced Adaptive Large Neighborhood Search
The Pod Repositioning Problem (PRP) in Robotic Mobile Fulfillment Systems (RMFS) involves selecting optimal storage locations for pods returning from pick stations. This work presents an improved solution method that integrates Adaptive Large Neighborhood Search (ALNS) with Deep Reinforcement Learning (DRL). A DRL agent dynamically selects destroy and repair operators and adjusts key parameters such as destruction degree and acceptance thresholds during the search. Specialized heuristics for both operators are designed to reflect PRP-specific characteristics, including pod usage frequency and movement costs. Computational results show that this DRL-guided ALNS outperforms traditional approaches such as cheapest-place, fixed-place, binary integer programming, and static heuristics. The method demonstrates strong solution quality and illustrating the benefit of learning-driven control within combinatorial optimization for warehouse systems.
comment: 14 pages, 2 figures, conference
GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal
Existing semantic SLAM in dynamic environments mainly identify dynamic regions through object detection or semantic segmentation methods. However, in certain highly dynamic scenarios, the detection boxes or segmentation masks cannot fully cover dynamic regions. Therefore, this paper proposes a robust and efficient GeneA-SLAM2 system that leverages depth variance constraints to handle dynamic scenes. Our method extracts dynamic pixels via depth variance and creates precise depth masks to guide the removal of dynamic objects. Simultaneously, an autoencoder is used to reconstruct keypoints, improving the genetic resampling keypoint algorithm to obtain more uniformly distributed keypoints and enhance the accuracy of pose estimation. Our system was evaluated on multiple highly dynamic sequences. The results demonstrate that GeneA-SLAM2 maintains high accuracy in dynamic scenes compared to current methods. Code is available at: https://github.com/qingshufan/GeneA-SLAM2.
Stochastic Modeling of Road Hazards on Intersections and their Effect on Safety of Autonomous Vehicles
Autonomous vehicles (AV) look set to become common on our roads within the next few years. However, to achieve the final breakthrough, not only functional progress is required, but also satisfactory safety assurance must be provided. Among those, a question demanding special attention is the need to assess and quantify the overall safety of an AV. Such an assessment must consider on the one hand the imperfections of the AV functionality and on the other hand its interaction with the environment. In a previous paper we presented a model-based approach to AV safety assessment in which we use a probabilistic model to describe road hazards together with the impact on AV safety of imperfect behavior of AV functions, such as safety monitors and perception systems. With this model, we are able to quantify the likelihood of the occurrence of a fatal accident, for a single operating condition. In this paper, we extend the approach and show how the model can deal explicitly with a set of different operating conditions defined in a given ODD.
comment: This work has been submitted to the IEEE for possible publication
Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024
Visually impaired individuals face significant challenges navigating and interacting with unknown situations, particularly in tasks requiring spatial awareness and semantic scene understanding. To accelerate the development and evaluate the state of technologies that enable visually impaired people to solve these tasks, the Vision Assistance Race (VIS) at the Cybathlon 2024 competition was organized. In this work, we present Sight Guide, a wearable assistive system designed for the VIS. The system processes data from multiple RGB and depth cameras on an embedded computer that guides the user through complex, real-world-inspired tasks using vibration signals and audio commands. Our software architecture integrates classical robotics algorithms with learning-based approaches to enable capabilities such as obstacle avoidance, object detection, optical character recognition, and touchscreen interaction. In a testing environment, Sight Guide achieved a 95.7% task success rate, and further demonstrated its effectiveness during the Cybathlon competition. This work provides detailed insights into the system design, evaluation results, and lessons learned, and outlines directions towards a broader real-world applicability.
HORUS: A Mixed Reality Interface for Managing Teams of Mobile Robots IROS 2025
Mixed Reality (MR) interfaces have been extensively explored for controlling mobile robots, but there is limited research on their application to managing teams of robots. This paper presents HORUS: Holistic Operational Reality for Unified Systems, a Mixed Reality interface offering a comprehensive set of tools for managing multiple mobile robots simultaneously. HORUS enables operators to monitor individual robot statuses, visualize sensor data projected in real time, and assign tasks to single robots, subsets of the team, or the entire group, all from a Mini-Map (Ground Station). The interface also provides different teleoperation modes: a mini-map mode that allows teleoperation while observing the robot model and its transform on the mini-map, and a semi-immersive mode that offers a flat, screen-like view in either single or stereo view (3D). We conducted a user study in which participants used HORUS to manage a team of mobile robots tasked with finding clues in an environment, simulating search and rescue tasks. This study compared HORUS's full-team management capabilities with individual robot teleoperation. The experiments validated the versatility and effectiveness of HORUS in multi-robot coordination, demonstrating its potential to advance human-robot collaboration in dynamic, team-based environments.
comment: 7 pages, 7 figures, conference paper submitted to IROS 2025
Rodrigues Network for Learning Robot Actions
Understanding and predicting articulated actions is important in robot learning. However, common architectures such as MLPs and Transformers lack inductive biases that reflect the underlying kinematic structure of articulated systems. To this end, we propose the Neural Rodrigues Operator, a learnable generalization of the classical forward kinematics operation, designed to inject kinematics-aware inductive bias into neural computation. Building on this operator, we design the Rodrigues Network (RodriNet), a novel neural architecture specialized for processing actions. We evaluate the expressivity of our network on two synthetic tasks on kinematic and motion prediction, showing significant improvements compared to standard backbones. We further demonstrate its effectiveness in two realistic applications: (i) imitation learning on robotic benchmarks with the Diffusion Policy, and (ii) single-image 3D hand reconstruction. Our results suggest that integrating structured kinematic priors into the network architecture improves action learning in various domains.
Multi Layered Autonomy and AI Ecologies in Robotic Art Installations
Symbiosis of Agents is a large-scale installation by Baoyang Chen that embeds AI-driven robots in an immersive, mirror-lined arena, probing the tension between machine agency and artistic authorship. Drawing on early cybernetics, rule-based conceptual art, and seminal robotic works, it orchestrates fluid exchanges among robotic arms, quadruped machines, their environment, and the public. A three tier faith system pilots the ecology: micro-level adaptive tactics, meso-level narrative drives, and a macro-level prime directive. This hierarchy lets behaviors evolve organically in response to environmental cues and even a viewer's breath, turning spectators into co-authors of the unfolding drama.Framed by a speculative terraforming scenario that recalls the historical exploitation of marginalized labor, the piece asks who bears responsibility in AI-mediated futures. Choreographed motion, AI-generated scripts, reactive lighting, and drifting fog cast the robots as collaborators rather than tools, forging a living, emergent artwork. Exhibited internationally, Symbiosis of Agents shows how cybernetic feedback, robotic experimentation, and conceptual rule-making can converge to redefine agency, authorship, and ethics in contemporary art.
A Hybrid Approach to Indoor Social Navigation: Integrating Reactive Local Planning and Proactive Global Planning ICRA 2025
We consider the problem of indoor building-scale social navigation, where the robot must reach a point goal as quickly as possible without colliding with humans who are freely moving around. Factors such as varying crowd densities, unpredictable human behavior, and the constraints of indoor spaces add significant complexity to the navigation task, necessitating a more advanced approach. We propose a modular navigation framework that leverages the strengths of both classical methods and deep reinforcement learning (DRL). Our approach employs a global planner to generate waypoints, assigning soft costs around anticipated pedestrian locations, encouraging caution around potential future positions of humans. Simultaneously, the local planner, powered by DRL, follows these waypoints while avoiding collisions. The combination of these planners enables the agent to perform complex maneuvers and effectively navigate crowded and constrained environments while improving reliability. Many existing studies on social navigation are conducted in simplistic or open environments, limiting the ability of trained models to perform well in complex, real-world settings. To advance research in this area, we introduce a new 2D benchmark designed to facilitate development and testing of social navigation strategies in indoor environments. We benchmark our method against traditional and RL-based navigation strategies, demonstrating that our approach outperforms both.
comment: Accepted at ICRA 2025
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations
Accurate LiDAR-camera calibration is fundamental to fusing multi-modal perception in autonomous driving and robotic systems. Traditional calibration methods require extensive data collection in controlled environments and cannot compensate for the transformation changes during the vehicle/robot movement. In this paper, we propose the first model that uses bird's-eye view (BEV) features to perform LiDAR camera calibration from raw data, termed BEVCALIB. To achieve this, we extract camera BEV features and LiDAR BEV features separately and fuse them into a shared BEV feature space. To fully utilize the geometric information from the BEV feature, we introduce a novel feature selector to filter the most important features in the transformation decoder, which reduces memory consumption and enables efficient training. Extensive evaluations on KITTI, NuScenes, and our own dataset demonstrate that BEVCALIB establishes a new state of the art. Under various noise conditions, BEVCALIB outperforms the best baseline in the literature by an average of (47.08%, 82.32%) on KITTI dataset, and (78.17%, 68.29%) on NuScenes dataset, in terms of (translation, rotation), respectively. In the open-source domain, it improves the best reproducible baseline by one order of magnitude. Our code and demo results are available at https://cisl.ucr.edu/BEVCalib.
Sign Language: Towards Sign Understanding for Robot Autonomy
Signage is an ubiquitous element of human environments, playing a critical role in both scene understanding and navigation. For autonomous systems to fully interpret human environments, effectively parsing and understanding signs is essential. We introduce the task of navigational sign understanding, aimed at extracting navigational cues from signs that convey symbolic spatial information about the scene. Specifically, we focus on signs capturing directional cues that point toward distant locations and locational cues that identify specific places. To benchmark performance on this task, we curate a comprehensive test set, propose appropriate evaluation metrics, and establish a baseline approach. Our test set consists of over 160 images, capturing signs with varying complexity and design across a wide range of public spaces, such as hospitals, shopping malls, and transportation hubs. Our baseline approach harnesses Vision-Language Models (VLMs) to parse navigational signs under these high degrees of variability. Experiments show that VLMs offer promising performance on this task, potentially motivating downstream applications in robotics. The code and dataset are available on Github.
HiLO: High-Level Object Fusion for Autonomous Driving using Transformers
The fusion of sensor data is essential for a robust perception of the environment in autonomous driving. Learning-based fusion approaches mainly use feature-level fusion to achieve high performance, but their complexity and hardware requirements limit their applicability in near-production vehicles. High-level fusion methods offer robustness with lower computational requirements. Traditional methods, such as the Kalman filter, dominate this area. This paper modifies the Adapted Kalman Filter (AKF) and proposes a novel transformer-based high-level object fusion method called HiLO. Experimental results demonstrate improvements of $25.9$ percentage points in $\textrm{F}_1$ score and $6.1$ percentage points in mean IoU. Evaluation on a new large-scale real-world dataset demonstrates the effectiveness of the proposed approaches. Their generalizability is further validated by cross-domain evaluation between urban and highway scenarios. Code, data, and models are available at https://github.com/rst-tu-dortmund/HiLO .
comment: 6 pages, accepted at IEEE Intelligent Vehicles Symposium (IV) 2025
AURA: Agentic Upskilling via Reinforced Abstractions
We study the combinatorial explosion involved in translating high-level task prompts into deployable control policies for agile robots through multi-stage reinforcement learning. We introduce AURA (Agentic Upskilling via Reinforced Abstractions), a schema-centric curriculum RL framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated against a schema before any GPU time is consumed, ensuring reliable and efficient execution without human intervention. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine staged curricula based on prior training results stored in a vector database, supporting continual improvement over time. Ablation studies highlight the importance of retrieval for curriculum quality and convergence stability. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines on GPU-accelerated training frameworks. In qualitative tests, AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot across a range of environments. By abstracting away the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be prohibitively complex to construct by hand.
Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges
We propose a new approach to vision-based dexterous grasp translation, which aims to transfer grasp intent across robotic hands with differing morphologies. Given a visual observation of a source hand grasping an object, our goal is to synthesize a functionally equivalent grasp for a target hand without requiring paired demonstrations or hand-specific simulations. We frame this problem as a stochastic transport between grasp distributions using the Schr\"odinger Bridge formalism. Our method learns to map between source and target latent grasp spaces via score and flow matching, conditioned on visual observations. To guide this translation, we introduce physics-informed cost functions that encode alignment in base pose, contact maps, wrench space, and manipulability. Experiments across diverse hand-object pairs demonstrate our approach generates stable, physically grounded grasps with strong generalization. This work enables semantic grasp transfer for heterogeneous manipulators and bridges vision-based grasping with probabilistic generative modeling.
comment: 19 pages, 4 figures
Olfactory Inertial Odometry: Methodology for Effective Robot Navigation by Scent
Olfactory navigation is one of the most primitive mechanisms of exploration used by organisms. Navigation by machine olfaction (artificial smell) is a very difficult task to both simulate and solve. With this work, we define olfactory inertial odometry (OIO), a framework for using inertial kinematics, and fast-sampling olfaction sensors to enable navigation by scent analogous to visual inertial odometry (VIO). We establish how principles from SLAM and VIO can be extrapolated to olfaction to enable real-world robotic tasks. We demonstrate OIO with three different odour localization algorithms on a real 5-DoF robot arm over an odour-tracking scenario that resembles real applications in agriculture and food quality control. Our results indicate success in establishing a baseline framework for OIO from which other research in olfactory navigation can build, and we note performance enhancements that can be made to address more complex tasks in the future.
Dynamic real-time multi-UAV cooperative mission planning method under multiple constraints
As UAV popularity soars, so does the mission planning associated with it. The classical approaches suffer from the triple problems of decoupled of task assignment and path planning, poor real-time performance and limited adaptability. Aiming at these challenges, this paper proposes a dynamic real-time multi-UAV collaborative mission planning algorithm based on Dubins paths under a distributed formation structure. Dubins path with multiple advantages bridges the gap between task assignment and path planning, leading to a coupled solution for mission planning. Then, a series of acceleration techniques, task clustering preprocessing, highly efficient distance cost functions, low-complexity and less iterative task allocation strategies, are employed to guarantee the real-time performance of the algorithms. To cope with different emergencies and their simultaneous extremes, real-time planning of emerging tasks and mission replanning due to the reduction of available UAVs are appropriately handled. Finally, the developed algorithm is comprehensively exemplified and studied through simulations, highlighting that the proposed method only sacrifices 9.57% of the path length, while achieving a speed improvement of 4-5 orders of magnitude over the simulated annealing method, with a single mission planning of about 0.0003s.
SAVOR: Skill Affordance Learning from Visuo-Haptic Perception for Robot-Assisted Bite Acquisition
Robot-assisted feeding requires reliable bite acquisition, a challenging task due to the complex interactions between utensils and food with diverse physical properties. These interactions are further complicated by the temporal variability of food properties-for example, steak becomes firm as it cools even during a meal. To address this, we propose SAVOR, a novel approach for learning skill affordances for bite acquisition-how suitable a manipulation skill (e.g., skewering, scooping) is for a given utensil-food interaction. In our formulation, skill affordances arise from the combination of tool affordances (what a utensil can do) and food affordances (what the food allows). Tool affordances are learned offline through calibration, where different utensils interact with a variety of foods to model their functional capabilities. Food affordances are characterized by physical properties such as softness, moisture, and viscosity, initially inferred through commonsense reasoning using a visually-conditioned language model and then dynamically refined through online multi-modal visuo-haptic perception using SAVOR-Net during interaction. Our method integrates these offline and online estimates to predict skill affordances in real time, enabling the robot to select the most appropriate skill for each food item. Evaluated on 20 single-item foods and 10 in-the-wild meals, our approach improves bite acquisition success by 13% over state-of-the-art (SOTA) category-based methods (e.g. use skewer for fruits). These results highlight the importance of modeling interaction-driven skill affordances for generalizable and effective robot-assisted bite acquisition. Website: https://emprise.cs.cornell.edu/savor/
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i.e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal. Despite the great potential of the end-to-end paradigm, existing methods suffer from several aspects including expensive BEV (bird's eye view) computation, action diversity, and sub-optimal decision in complex real-world scenarios. To address these challenges, we propose a novel hybrid sparse-dense diffusion policy, empowered by a Vision-Language Model (VLM), called Diff-VLA. We explore the sparse diffusion representation for efficient multi-modal driving behavior. Moreover, we rethink the effectiveness of VLM driving decision and improve the trajectory generation guidance through deep interaction across agent, map instances and VLM output. Our method shows superior performance in Autonomous Grand Challenge 2025 which contains challenging real and reactive synthetic scenarios. Our methods achieves 45.0 PDMS.
comment: 4pages
Beacon-Based Feedback Control for Parking an Active-Joint Center-Articulated Mobile Robot CEC
This paper presents an autonomous parking control strategy for an active-joint center-articulated mobile robot. We first derive a kinematic model of the robot, then propose a control law to stabilize the vehicle's configuration within a small neighborhood of the goal. The control law, designed using Lyapunov techniques, is based on the robot's polar coordinate equations. A beacon-based guidance system provides feedback on the target's position and orientation. Simulations demonstrate the robot's ability to park successfully from arbitrary initial poses.
comment: IEEE Conference - CCECE 2010
Improving Trajectory Stitching with Flow Models
Generative models have shown great promise as trajectory planners, given their affinity to modeling complex distributions and guidable inference process. Previous works have successfully applied these in the context of robotic manipulation but perform poorly when the required solution does not exist as a complete trajectory within the training set. We identify that this is a result of being unable to plan via stitching, and subsequently address the architectural and dataset choices needed to remedy this. On top of this, we propose a novel addition to the training and inference procedures to both stabilize and enhance these capabilities. We demonstrate the efficacy of our approach by generating plans with out of distribution boundary conditions and performing obstacle avoidance on the Franka Panda in simulation and on real hardware. In both of these tasks our method performs significantly better than the baselines and is able to avoid obstacles up to four times as large.
Offline Adaptation of Quadruped Locomotion using Diffusion Models
We present a diffusion-based approach to quadrupedal locomotion that simultaneously addresses the limitations of learning and interpolating between multiple skills and of (modes) offline adapting to new locomotion behaviours after training. This is the first framework to apply classifier-free guided diffusion to quadruped locomotion and demonstrate its efficacy by extracting goal-conditioned behaviour from an originally unlabelled dataset. We show that these capabilities are compatible with a multi-skill policy and can be applied with little modification and minimal compute overhead, i.e., running entirely on the robots onboard CPU. We verify the validity of our approach with hardware experiments on the ANYmal quadruped platform.
HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting
3D Gaussian Splatting offers expressive scene reconstruction, modeling a broad range of visual, geometric, and semantic information. However, efficient real-time map reconstruction with data streamed from multiple robots and devices remains a challenge. To that end, we propose HAMMER, a server-based collaborative Gaussian Splatting method that leverages widely available ROS communication infrastructure to generate 3D, metric-semantic maps from asynchronous robot data-streams with no prior knowledge of initial robot positions and varying on-device pose estimators. HAMMER consists of (i) a frame alignment module that transforms local SLAM poses and image data into a global frame and requires no prior relative pose knowledge, and (ii) an online module for training semantic 3DGS maps from streaming data. HAMMER handles mixed perception modes, adjusts automatically for variations in image pre-processing among different devices, and distills CLIP semantic codes into the 3D scene for open-vocabulary language queries. In our real-world experiments, HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is useful for downstream tasks, such as semantic goal-conditioned navigation (e.g., "go to the couch"). Accompanying content available at hammer-project.github.io.
MultiDLO: Simultaneous Shape Tracking of Multiple Deformable Linear Objects with Global-Local Topology Preservation
MultiDLO is a real-time algorithm for estimating the shapes of multiple, intertwining deformable linear objects (DLOs) from RGB-D image sequences. Unlike prior methods that track only a single DLO, MultiDLO simultaneously handles several objects. It uses the geodesic distance in the Global-Local Topology Preservation algorithm to define both inter-object identity and intra-object topology, ensuring entangled DLOs remain distinct with accurate local geometry. The MultiDLO algorithm is demonstrated on two challenging scenarios involving three entangling ropes, and the implementation is open-source and available for the community.
comment: 3 pages, 3 figures, presented at the 3rd Workshop on Representing and Manipulating Deformable Objects at the IEEE International Conference on Robotics and Automation. Video presentation [https://youtu.be/hfiqwMxitqA]. 3rd Workshop on Representing and Manipulating Deformable Objects [https://deformable-workshop.github.io/icra2023/]
Learning Collision Risk from Naturalistic Driving with Generalised Surrogate Safety Measures
Accurate and timely alerts for drivers or automated systems to unfolding collisions remains a challenge in road safety, particularly in highly interactive urban traffic. Existing approaches require labour-intensive annotation of sparse risk, struggle to consider varying contextual factors, or are useful only in the scenarios they are designed for. To address these limits, this study introduces the generalised surrogate safety measure (GSSM), a new approach that learns exclusively from naturalistic driving without crash or risk labels. GSSM captures the patterns of normal driving and estimates the extent to which a traffic interaction deviates from the norm towards unsafe extreme. Utilising neural networks, normal interactions are characterised by context-conditioned distributions of multi-directional spacing between road users. In the same interaction context, a spacing closer than normal entails higher risk of potential collision. Then a context-adaptive risk score and its associated probability can be calculated based on the theory of extreme values. Any measurable factors, such as motion kinematics, weather, lighting, can serve as part of the context, allowing for diverse coverage of safety-critical interactions. Multiple public driving datasets are used to train GSSMs, which are tested with 2,591 real-world crashes and near-crashes reconstructed from the SHRP2 NDS. A vanilla GSSM using only instantaneous states achieves AUPRC of 0.9 and secures a median time advance of 2.6 seconds to prevent potential collisions. Additional data and contextual factors provide further performance gains. Across various interaction types such as rear-end, merging, and crossing, the accuracy and timeliness of GSSM consistently outperforms existing baselines. GSSM therefore establishes a scalable, context-aware, and generalisable foundation to proactively quantify collision risk in traffic interactions.
comment: 18 pages, 8 figures
Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames ICCV 2023
Most consumer cameras use rolling shutter (RS) exposure, which often leads to distortions such as skew and jelly effects. These videos are further limited by bandwidth and frame rate constraints. In this paper, we explore the potential of event cameras, which offer high temporal resolution. We propose a framework to recover global shutter (GS) high-frame-rate videos without RS distortion by combining an RS camera and an event camera. Due to the lack of real-world datasets, our framework adopts a self-supervised strategy based on a displacement field, a dense 3D spatiotemporal representation of pixel motion during exposure. This enables mutual reconstruction between RS and GS frames and facilitates slow-motion recovery. We combine RS frames with the displacement field to generate GS frames, and integrate inverse mapping and RS frame warping for self-supervision. Experiments on four datasets show that our method removes distortion, reduces bandwidth usage by 94 percent, and achieves 16 ms per frame at 32x interpolation.
comment: An earlier version of this paper (ID: 1845) was submitted to ICCV 2023 in March 2023. The work has been substantially revised and accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG)
X-Driver: Explainable Autonomous Driving with Vision-Language Models
End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a unified multi-modal large language models(MLLMs) framework designed for closed-loop autonomous driving, leveraging Chain-of-Thought(CoT) and autoregressive modeling to enhance perception and decision-making. We validate X-Driver across multiple autonomous driving tasks using public benchmarks in CARLA simulation environment, including Bench2Drive[6]. Our experimental results demonstrate superior closed-loop performance, surpassing the current state-of-the-art(SOTA) while improving the interpretability of driving decisions. These findings underscore the importance of structured reasoning in end-to-end driving and establish X-Driver as a strong baseline for future research in closed-loop autonomous driving.
Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization
Deep reinforcement learning (RL) excels in various control tasks, yet the absence of safety guarantees hampers its real-world applicability. In particular, explorations during learning usually results in safety violations, while the RL agent learns from those mistakes. On the other hand, safe control techniques ensure persistent safety satisfaction but demand strong priors on system dynamics, which is usually hard to obtain in practice. To address these problems, we present Safe Set Guided State-wise Constrained Policy Optimization (S-3PO), a pioneering algorithm generating state-wise safe optimal policies with zero training violations, i.e., learning without mistakes. S-3PO first employs a safety-oriented monitor with black-box dynamics to ensure safe exploration. It then enforces an "imaginary" cost for the RL agent to converge to optimal behaviors within safety constraints. S-3PO outperforms existing methods in high-dimensional robotics tasks, managing state-wise constraints with zero training violation. This innovation marks a significant stride towards real-world safe RL deployment.
comment: arXiv admin note: text overlap with arXiv:2306.12594
Continual Learning and Lifting of Koopman Dynamics for Linear Control of Legged Robots
The control of legged robots, particularly humanoid and quadruped robots, presents significant challenges due to their high-dimensional and nonlinear dynamics. While linear systems can be effectively controlled using methods like Model Predictive Control (MPC), the control of nonlinear systems remains complex. One promising solution is the Koopman Operator, which approximates nonlinear dynamics with a linear model, enabling the use of proven linear control techniques. However, achieving accurate linearization through data-driven methods is difficult due to issues like approximation error, domain shifts, and the limitations of fixed linear state-space representations. These challenges restrict the scalability of Koopman-based approaches. This paper addresses these challenges by proposing a continual learning algorithm designed to iteratively refine Koopman dynamics for high-dimensional legged robots. The key idea is to progressively expand the dataset and latent space dimension, enabling the learned Koopman dynamics to converge towards accurate approximations of the true system dynamics. Theoretical analysis shows that the linear approximation error of our method converges monotonically. Experimental results demonstrate that our method achieves high control performance on robots like Unitree G1/H1/A1/Go2 and ANYmal D, across various terrains using simple linear MPC controllers. This work is the first to successfully apply linearized Koopman dynamics for locomotion control of high-dimensional legged robots, enabling a scalable model-based control solution.
FF-SRL: High Performance GPU-Based Surgical Simulation For Robot Learning
Robotic surgery is a rapidly developing field that can greatly benefit from the automation of surgical tasks. However, training techniques such as Reinforcement Learning (RL) require a high number of task repetitions, which are generally unsafe and impractical to perform on real surgical systems. This stresses the need for simulated surgical environments, which are not only realistic, but also computationally efficient and scalable. We introduce FF-SRL (Fast and Flexible Surgical Reinforcement Learning), a high-performance learning environment for robotic surgery. In FF-SRL both physics simulation and RL policy training reside entirely on a single GPU. This avoids typical bottlenecks associated with data transfer between the CPU and GPU, leading to accelerated learning rates. Our results show that FF-SRL reduces the training time of a complex tissue manipulation task by an order of magnitude, down to a couple of minutes, compared to a common CPU/GPU simulator. Such speed-up may facilitate the experimentation with RL techniques and contribute to the development of new generation of surgical systems. To this end, we make our code publicly available to the community.
Cooperative Indoor Exploration Leveraging a Mixed-Size UAV Team with Heterogeneous Sensors
Heterogeneous teams of Unmanned Aerial Vehicles (UAVs) can enhance the exploration capabilities of aerial robots by exploiting different strengths and abilities of varying UAVs. This paper presents a novel method for exploring unknown indoor spaces with a team of UAVs of different sizes and sensory equipment. We propose a frontier-based exploration with two task allocation strategies: a greedy strategy that assigns Points of Interest (POIs) based on Euclidean distance and UAV priority and an optimization strategy that solves a minimum-cost flow problem. The proposed method utilizes the SphereMap algorithm to assess the accessibility of the POIs and generate paths that account for obstacle distances, including collision avoidance maneuvers among UAVs. The proposed approach was validated through simulation testing and real-world experiments that evaluated the method's performance on board the UAVs.
comment: IEEE CASE 2024, accepted version
LAMARL: LLM-Aided Multi-Agent Reinforcement Learning for Cooperative Policy Generation
Although Multi-Agent Reinforcement Learning (MARL) is effective for complex multi-robot tasks, it suffers from low sample efficiency and requires iterative manual reward tuning. Large Language Models (LLMs) have shown promise in single-robot settings, but their application in multi-robot systems remains largely unexplored. This paper introduces a novel LLM-Aided MARL (LAMARL) approach, which integrates MARL with LLMs, significantly enhancing sample efficiency without requiring manual design. LAMARL consists of two modules: the first module leverages LLMs to fully automate the generation of prior policy and reward functions. The second module is MARL, which uses the generated functions to guide robot policy training effectively. On a shape assembly benchmark, both simulation and real-world experiments demonstrate the unique advantages of LAMARL. Ablation studies show that the prior policy improves sample efficiency by an average of 185.9% and enhances task completion, while structured prompts based on Chain-of-Thought (CoT) and basic APIs improve LLM output success rates by 28.5%-67.5%. Videos and code are available at https://windylab.github.io/LAMARL/
comment: Accepted by IEEE Robotics and Automation Letters
Exploiting Local Observations for Robust Robot Learning
While many robotic tasks can be addressed through either centralized single-agent control with full state observation or decentralized multi-agent control, clear criteria for selecting the optimal approach are lacking. This paper presents a comprehensive investigation into how multi-agent reinforcement learning (MARL) with local observations can enhance robustness in complex robotic systems compared to traditional centralized control methods. We provide both theoretical analysis and empirical validation demonstrating that in certain tasks, decentralized MARL controllers can achieve performance comparable to centralized approaches while offering superior robustness against perturbations and agent failures. Our theoretical contributions include an analytical proof of equivalence between SARL and MARL under full observability conditions, identifying observability as the key distinguishing factor, and derivation of performance degradation bounds for locally observable policies under external perturbations. Empirical validation on standard MARL benchmarks confirms that locally observable MARL maintains competitive performance despite limited observations. Real-world experiments with a mobile manipulation robot demonstrate that our decentralized MARL controllers exhibit significantly improved robustness to both agent malfunctions and environmental disturbances compared to centralized baselines. This systematic investigation provides crucial insights for designing robust and generalizable control strategies in complex robotic systems, establishing MARL with local observations as a viable alternative to traditional centralized control paradigms.
comment: 10 pages, 9 figures
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Humans exhibit diverse and expressive whole-body movements. However, attaining human-like whole-body coordination in humanoid robots remains challenging, as conventional approaches that mimic whole-body motions often neglect the distinct roles of upper and lower body. This oversight leads to computationally intensive policy learning and frequently causes robot instability and falls during real-world execution. To address these issues, we propose Adversarial Locomotion and Motion Imitation (ALMI), a novel framework that enables adversarial policy learning between upper and lower body. Specifically, the lower body aims to provide robust locomotion capabilities to follow velocity commands while the upper body tracks various motions. Conversely, the upper-body policy ensures effective motion tracking when the robot executes velocity-based movements. Through iterative updates, these policies achieve coordinated whole-body control, which can be extended to loco-manipulation tasks with teleoperation systems. Extensive experiments demonstrate that our method achieves robust locomotion and precise motion tracking in both simulation and on the full-size Unitree H1 robot. Additionally, we release a large-scale whole-body motion control dataset featuring high-quality episodic trajectories from MuJoCo simulations deployable on real robots. The project page is https://almi-humanoid.github.io.
comment: Code: https://github.com/TeleHuman/ALMI-Open, Dataset: https://huggingface.co/datasets/TeleEmbodied/ALMI-X
VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion
Recent success in legged robot locomotion is attributed to the integration of reinforcement learning and physical simulators. However, these policies often encounter challenges when deployed in real-world environments due to sim-to-real gaps, as simulators typically fail to replicate visual realism and complex real-world geometry. Moreover, the lack of realistic visual rendering limits the ability of these policies to support high-level tasks requiring RGB-based perception like ego-centric navigation. This paper presents a Real-to-Sim-to-Real framework that generates photorealistic and physically interactive "digital twin" simulation environments for visual navigation and locomotion learning. Our approach leverages 3D Gaussian Splatting (3DGS) based scene reconstruction from multi-view images and integrates these environments into simulations that support ego-centric visual perception and mesh-based physical interactions. To demonstrate its effectiveness, we train a reinforcement learning policy within the simulator to perform a visual goal-tracking task. Extensive experiments show that our framework achieves RGB-only sim-to-real policy transfer. Additionally, our framework facilitates the rapid adaptation of robot policies with effective exploration capability in complex new environments, highlighting its potential for applications in households and factories.
comment: Project Page: https://vr-robo.github.io/
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Robotic manipulation systems operating in diverse, dynamic environments must exhibit three critical abilities: multitask interaction, generalization to unseen scenarios, and spatial memory. While significant progress has been made in robotic manipulation, existing approaches often fall short in generalization to complex environmental variations and addressing memory-dependent tasks. To bridge this gap, we introduce SAM2Act, a multi-view robotic transformer-based policy that leverages multi-resolution upsampling with visual representations from large-scale foundation model. SAM2Act achieves a state-of-the-art average success rate of 86.8% across 18 tasks in the RLBench benchmark, and demonstrates robust generalization on The Colosseum benchmark, with only a 4.3% performance gap under diverse environmental perturbations. Building on this foundation, we propose SAM2Act+, a memory-based architecture inspired by SAM2, which incorporates a memory bank, an encoder, and an attention mechanism to enhance spatial memory. To address the need for evaluating memory-dependent tasks, we introduce MemoryBench, a novel benchmark designed to assess spatial memory and action recall in robotic manipulation. SAM2Act+ achieves an average success rate of 94.3% on memory-based tasks in MemoryBench, significantly outperforming existing approaches and pushing the boundaries of memory-based robotic systems. Project page: sam2act.github.io.
comment: Including Appendix, Project Page: https://sam2act.github.io
STATE-NAV: Stability-Aware Traversability Estimation for Bipedal Navigation on Rough Terrain
Bipedal robots have advantages in maneuvering human-centered environments, but face greater failure risk compared to other stable mobile plarforms such as wheeled or quadrupedal robots. While learning-based traversability has been widely studied for these platforms, bipedal traversability has instead relied on manually designed rules with limited consideration of locomotion stability on rough terrain. In this work, we present the first learning-based traversability estimation and risk-sensitive navigation framework for bipedal robots operating in diverse, uneven environments. TravFormer, a transformer-based neural network, is trained to predict bipedal instability with uncertainty, enabling risk-aware and adaptive planning. Based on the network, we define traversability as stability-aware command velocity-the fastest command velocity that keeps instability below a user-defined limit. This velocity-based traversability is integrated into a hierarchical planner that combines traversability-informed Rapid Random Tree Star (TravRRT*) for time-efficient planning and Model Predictive Control (MPC) for safe execution. We validate our method in MuJoCo simulation, demonstrating improved navigation performance, with enhanced robustness and time efficiency across varying terrains compared to existing methods.
Back to Base: Towards Hands-Off Learning via Safe Resets with Reach-Avoid Safety Filters
Designing controllers that accomplish tasks while guaranteeing safety constraints remains a significant challenge. We often want an agent to perform well in a nominal task, such as environment exploration, while ensuring it can avoid unsafe states and return to a desired target by a specific time. In particular we are motivated by the setting of safe, efficient, hands-off training for reinforcement learning in the real world. By enabling a robot to safely and autonomously reset to a desired region (e.g., charging stations) without human intervention, we can enhance efficiency and facilitate training. Safety filters, such as those based on control barrier functions, decouple safety from nominal control objectives and rigorously guarantee safety. Despite their success, constructing these functions for general nonlinear systems with control constraints and system uncertainties remains an open problem. This paper introduces a safety filter obtained from the value function associated with the reach-avoid problem. The proposed safety filter minimally modifies the nominal controller while avoiding unsafe regions and guiding the system back to the desired target set. By preserving policy performance while allowing safe resetting, we enable efficient hands-off reinforcement learning and advance the feasibility of safe training for real world robots. We demonstrate our approach using a modified version of soft actor-critic to safely train a swing-up task on a modified cartpole stabilization problem.
comment: The first three authors contributed equally to the work
One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion
Humanoid locomotion faces a critical scalability challenge: traditional reinforcement learning (RL) methods require task-specific rewards and struggle to leverage growing datasets, even as more training terrains are introduced. We propose DreamPolicy, a unified framework that enables a single policy to master diverse terrains and generalize zero-shot to unseen scenarios by systematically integrating offline data and diffusion-driven motion synthesis. At its core, DreamPolicy introduces Humanoid Motion Imagery (HMI) - future state predictions synthesized through an autoregressive terrain-aware diffusion planner curated by aggregating rollouts from specialized policies across various distinct terrains. Unlike human motion datasets requiring laborious retargeting, our data directly captures humanoid kinematics, enabling the diffusion planner to synthesize "dreamed" trajectories that encode terrain-specific physical constraints. These trajectories act as dynamic objectives for our HMI-conditioned policy, bypassing manual reward engineering and enabling cross-terrain generalization. DreamPolicy addresses the scalability limitations of prior methods: while traditional RL fails to exploit growing datasets, our framework scales seamlessly with more offline data. As the dataset expands, the diffusion prior learns richer locomotion skills, which the policy leverages to master new terrains without retraining. Experiments demonstrate that DreamPolicy achieves average 90% success rates in training environments and an average of 20% higher success on unseen terrains than the prevalent method. It also generalizes to perturbed and composite scenarios where prior approaches collapse. By unifying offline data, diffusion-based trajectory synthesis, and policy optimization, DreamPolicy overcomes the "one task, one policy" bottleneck, establishing a paradigm for scalable, data-driven humanoid control.
Learning Autonomous Surgical Irrigation and Suction with the da Vinci Research Kit Using Reinforcement Learning
The irrigation-suction process is a common procedure to rinse and clean up the surgical field in minimally invasive surgery (MIS). In this process, surgeons first irrigate liquid, typically saline, into the surgical scene for rinsing and diluting the contaminant, and then suction the liquid out of the surgical field. While recent advances have shown promising results in the application of reinforcement learning (RL) for automating surgical subtasks, fewer studies have explored the automation of fluid-related tasks. In this work, we explore the automation of both steps in the irrigation-suction procedure and train two vision-based RL agents to complete irrigation and suction autonomously. To achieve this, a platform is developed for creating simulated surgical robot learning environments and for training agents, and two simulated learning environments are built for irrigation and suction with visually plausible fluid rendering capabilities. With techniques such as domain randomization (DR) and carefully designed reward functions, two agents are trained in the simulator and transferred to the real world. Individual evaluations of both agents show satisfactory real-world results. With an initial amount of around 5 grams of contaminants, the irrigation agent ultimately achieved an average of 2.21 grams remaining after a manual suction. As a comparison, fully manual operation by a human results in 1.90 grams remaining. The suction agent achieved 2.64 and 2.24 grams of liquid remaining across two trial groups with more than 20 and 30 grams of initial liquid in the container. Fully autonomous irrigation-suction trials reduce the contaminant in the container from around 5 grams to an average of 2.42 grams, although yielding a higher total weight remaining (4.40) due to residual liquid not suctioned. Further information about the project is available at https://tbs-ualberta.github.io/CRESSim/.
comment: 15 pages, 17 figures
Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction
Cutting-edge robot learning techniques including foundation models and imitation learning from humans all pose huge demands on large-scale and high-quality datasets which constitute one of the bottleneck in the general intelligent robot fields. This paper presents the Kaiwu multimodal dataset to address the missing real-world synchronized multimodal data problems in the sophisticated assembling scenario,especially with dynamics information and its fine-grained labelling. The dataset first provides an integration of human,environment and robot data collection framework with 20 subjects and 30 interaction objects resulting in totally 11,664 instances of integrated actions. For each of the demonstration,hand motions,operation pressures,sounds of the assembling process,multi-view videos, high-precision motion capture information,eye gaze with first-person videos,electromyography signals are all recorded. Fine-grained multi-level annotation based on absolute timestamp,and semantic segmentation labelling are performed. Kaiwu dataset aims to facilitate robot learning,dexterous manipulation,human intention investigation and human-robot collaboration research.
comment: 8 pages, 5 figures, Submitted to IEEE Robotics and Automation Letters (RAL)
Multiagent Systems
MAEBE: Multi-Agent Emergent Behavior Framework ICML 2025
Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.
comment: Preprint. This work has been submitted to the Multi-Agent Systems Workshop at ICML 2025 for review
Adaptive Graph Pruning for Multi-Agent Communication
Large Language Model (LLM) based multi-agent systems have shown remarkable performance in various tasks, especially when enhanced through collaborative communication. However, current methods often rely on a fixed number of agents and static communication structures, limiting their ability to adapt to varying task complexities. In this paper, we propose Adaptive Graph Pruning (AGP), a novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning). Specifically, our method employs a two-stage training strategy: firstly, independently training soft-pruning networks for different agent quantities to determine optimal agent-quantity-specific complete graphs and positional masks across specific tasks; and then jointly optimizing hard-pruning and soft-pruning within a maximum complete graph to dynamically configure the number of agents and their communication topologies per task. Extensive experiments demonstrate that our approach is: (1) High-performing, achieving state-of-the-art results across six benchmarks and consistently generalizes across multiple mainstream LLM architectures, with a increase in performance of $2.58\%\sim 9.84\%$; (2) Task-adaptive, dynamically constructing optimized communication topologies tailored to specific tasks, with an extremely high performance in all three task categories (general reasoning, mathematical reasoning, and code generation); (3) Token-economical, having fewer training steps and token consumption at the same time, with a decrease in token consumption of $90\%+$; and (4) Training-efficient, achieving high performance with very few training steps compared with other methods. The performance will surpass the existing baselines after about ten steps of training under six benchmarks.
ThinkTank: A Framework for Generalizing Domain-Specific AI Agent Systems into Universal Collaborative Intelligence Platforms
This paper presents ThinkTank, a comprehensive and scalable framework designed to transform specialized AI agent systems into versatile collaborative intelligence platforms capable of supporting complex problem-solving across diverse domains. ThinkTank systematically generalizes agent roles, meeting structures, and knowledge integration mechanisms by adapting proven scientific collaboration methodologies. Through role abstraction, generalization of meeting types for iterative collaboration, and the integration of Retrieval-Augmented Generation with advanced knowledge storage, the framework facilitates expertise creation and robust knowledge sharing. ThinkTank enables organizations to leverage collaborative AI for knowledge-intensive tasks while ensuring data privacy and security through local deployment, utilizing frameworks like Ollama with models such as Llama3.1. The ThinkTank framework is designed to deliver significant advantages in cost-effectiveness, data security, scalability, and competitive positioning compared to cloud-based alternatives, establishing it as a universal platform for AI-driven collaborative problem-solving. The ThinkTank code is available at https://github.com/taugroup/ThinkTank
Systems and Control (CS)
Backpressure-based Mean-field Type Game for Scheduling in Multi-Hop Wireless Sensor Networks
We propose a Mean-Field Type Game (MFTG) framework for effective scheduling in multi-hop wireless sensor networks (WSNs) using backpressure as a performance criterion. Traditional backpressure algorithms leverage queue differentials to regulate data flow and maintain network stability. In this work, we extend the backpressure framework by incorporating a mean-field term into the cost functional, capturing the global behavior of the system alongside local dynamics. The resulting model utilizes the strengths of non-cooperative mean-field type games, enabling nodes to make decentralized decisions based on both individual queue states and system mean-field effects while accounting for stochastic network interactions. By leveraging the interplay between backpressure dynamics and mean-field coupling, the approach balances local optimization with global efficiency. Numerical simulations demonstrate the efficacy of the proposed method in handling congestion and scheduling in large-scale WSNs.
comment: Accepted to the 33rd European Signal Processing Conference (EUSIPCO 2025)
Computation- and Communication-Efficient Online FL for Resource-Constrained Aerial Vehicles
Privacy-preserving distributed machine learning (ML) and aerial connected vehicle (ACV)-assisted edge computing have drawn significant attention lately. Since the onboard sensors of ACVs can capture new data as they move along their trajectories, the continual arrival of such 'newly' sensed data leads to online learning and demands carefully crafting the trajectories. Besides, as typical ACVs are inherently resource-constrained, computation- and communication-efficient ML solutions are needed. Therefore, we propose a computation- and communication-efficient online aerial federated learning (2CEOAFL) algorithm to take the benefits of continual sensed data and limited onboard resources of the ACVs. In particular, considering independently owned ACVs act as selfish data collectors, we first model their trajectories according to their respective time-varying data distributions. We then propose a 2CEOAFL algorithm that allows the flying ACVs to (a) prune the received dense ML model to make it shallow, (b) train the pruned model, and (c) probabilistically quantize and offload their trained accumulated gradients to the central server (CS). Our extensive simulation results show that the proposed 2CEOAFL algorithm delivers comparable performances to its non-pruned and nonquantized, hence, computation- and communication-inefficient counterparts.
CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge ATC 2025
Deploying large language models (LLMs) on edge devices is crucial for delivering fast responses and ensuring data privacy. However, the limited storage, weight, and power of edge devices make it difficult to deploy LLM-powered applications. These devices must balance latency requirements with energy consumption and model accuracy. In this paper, we first quantify the challenges of deploying LLMs on off-the-shelf edge devices and then we present CLONE, an in-depth algorithm-hardware co-design at both the model- and system-level that intelligently integrates real-time, energy optimization while maintaining robust generality. In order to maximize the synergistic benefits of these algorithms in always-on and intermediate edge computing settings, we specialize in a 28nm scalable hardware accelerator system. We implement and extensively evaluate CLONE on two off-the-shelf edge platforms. Experiments show that CLONE effectively accelerates the inference process up to 11.92x, and saves energy up to 7.36x, while maintaining high-generation.
comment: Accepted by USENIX ATC 2025
Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD($\lambda$) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.
On dual-rate consensus under transmission delays
In this paper, we investigate the problem of dual-rate consensus under transmission delays, where the control updates happen at a faster rate than the measurements being received. We assume that the measurements are delayed by a fixed delay and show that for all delays and rates, the system reaches a consensus if and only if the communication graph of the agents is connected and the control gain is chosen in a specific interval. Based on these results we dive deeper into the convergence properties and investigate how the convergence changes when we change the rate for sending measurements. We observe that in certain cases there exists a sweet spot for choosing the sampling rate of the measurements, which can improve the convergence to the consensus point. We then formulate an optimization problem to find a sampling rate to improve the convergence speed and provide a necessary and sufficient condition for the existence of a finite optimizer of this problem. Our results are verified with numerical simulations.
comment: To be presented at the 2025 European Control Conference, 9 pages, 3 figures
Target Sensing Performance in Disaster-Specific ISAC Networks
As sixth-generation (6G) wireless technology emerges, integrated sensing and communication (ISAC) networks offer significant potential for enhancing real-time monitoring in disaster areas. However, existing ISAC approaches often fail to address the unique challenges of dynamic and cluttered disaster areas, resulting in limited sensing coverage and interruptions in sensing service. To address these limitations, this work proposes a mobile ISAC network specifically designed for disaster scenarios. By leveraging stochastic geometry, we derive closed-form expressions for sensing coverage and introduce a novel performance metric to evaluate sensing service continuity. Simulation results validate the analytical derivations and offer key insights into network design.
Quantized Dissipative Uncertain Model for Fractional T_S Fuzzy systems with Time_Varying Delays Under Networked Control System
This paper addressed with the quantized dissipative uncertain problem for delayed fractional T_S Fuzzy system for event_triggered networked systems (E_NS), where the extended dissipativity analysis combines the H infinity, dissipativity, L2 and L infinity and passivity performance in a unified frame. To attain the high efficiency for available channel resources, measurement size decrease mechanism and event_triggered scheme (ETS) are proposed. Firstly, we present the ETS in which signal is transmitted through the channel with logical function then logarithmic quantization methodology is implemented for size reduction. Then, we transfer the original delayed fractional T_S fuzzy systems with the effect of quantization under ETS as induced communications delays. Furthermore, by employing the associative Lyapunov functional method in terms of linear matrix inequalities, adequate conditions for asymptotical stability is given. Moreover, we also construct the design fuzzy model for state space filtering system. At last, a truck_trailer model is given to show the effectiveness of the proposed strategy.
Geometric Visual Servo Via Optimal Transport
When developing control laws for robotic systems, the principle factor when examining their performance is choosing inputs that allow smooth tracking to a reference input. In the context of robotic manipulation, this involves translating an object or end-effector from an initial pose to a target pose. Robotic manipulation control laws frequently use vision systems as an error generator to track features and produce control inputs. However, current control algorithms don't take into account the probabilistic features that are extracted and instead rely on hand-tuned feature extraction methods. Furthermore, the target features can exist in a static pose thus allowing a combined pose and feature error for control generation. We present a geometric control law for the visual servoing problem for robotic manipulators. The input from the camera constitutes a probability measure on the 3-dimensional Special Euclidean task-space group, where the Wasserstein distance between the current and desired poses is analogous with the geometric geodesic. From this, we develop a controller that allows for both pose and image-based visual servoing by combining classical PD control with gravity compensation with error minimization through the use of geodesic flows on a 3-dimensional Special Euclidean group. We present our results on a set of test cases demonstrating the generalisation ability of our approach to a variety of initial positions.
comment: 19 pages, 5 figures
Recursive Privacy-Preserving Estimation Over Markov Fading Channels
In industrial applications, the presence of moving machinery, vehicles, and personnel, contributes to the dynamic nature of the wireless channel. This time variability induces channel fading, which can be effectively modeled using a Markov fading channel (MFC). In this paper, we investigate the problem of secure state estimation for systems that communicate over a MFC in the presence of an eavesdropper. The objective is to enable a remote authorized user to accurately estimate the states of a dynamic system, while considering the potential interception of the sensor's packet through a wiretap channel. To prevent information leakage, a novel co-design strategy is established, which combines a privacy-preserving mechanism with a state estimator. To implement our encoding scheme, a nonlinear mapping of the innovation is introduced based on the weighted reconstructed innovation previously received by the legitimate user. Corresponding to this encoding scheme, we design a recursive privacy-preserving filtering algorithm to achieve accurate estimation. The boundedness of estimation error dynamics at the legitimate user's side is discussed and the divergence of the eavesdropper's estimation error is analyzed, which demonstrates the effectiveness of our co-design strategy in ensuring secrecy. Furthermore, a simulation example of a three-tank system is provided to demonstrate the effectiveness and feasibility of our privacy-preserving estimation method.
comment: 12 pages, 5 figures
Unit Commitment with Cost-Oriented Temporal Resolution
Time-adaptive unit commitment (UC) has recently been investigated to reduce the scheduling costs by flexibly varying the temporal resolution, which is usually determined by clustering the net load patterns. However, there exists a misalignment between cost and net load patterns due to the discrete start-up costs and out-of-merit-order dispatch triggered by ramping and other constraints. The optimal time-adaptive resolution cannot be completely captured by clustering-based method. This paper proposes a cost-oriented method to address this misalignment by a novel bilevel optimization approach that is efficiently solved through a heuristic greedy algorithm. The impact of varying temporal resolution on the final scheduling costs are tested, based on which the temporal resolution is heuristically updated, achieving significant cost reduction without increasing the number of temporal periods. Subsequently, an improved discretized Adam optimization method together with offline warm start and online refinement strategy is proposed to efficiently search for the better temporal resolution configuration. Results show that the proposed cost-oriented UC temporal resolution determination method achieves enhanced cost efficiency.
Olfactory Inertial Odometry: Methodology for Effective Robot Navigation by Scent
Olfactory navigation is one of the most primitive mechanisms of exploration used by organisms. Navigation by machine olfaction (artificial smell) is a very difficult task to both simulate and solve. With this work, we define olfactory inertial odometry (OIO), a framework for using inertial kinematics, and fast-sampling olfaction sensors to enable navigation by scent analogous to visual inertial odometry (VIO). We establish how principles from SLAM and VIO can be extrapolated to olfaction to enable real-world robotic tasks. We demonstrate OIO with three different odour localization algorithms on a real 5-DoF robot arm over an odour-tracking scenario that resembles real applications in agriculture and food quality control. Our results indicate success in establishing a baseline framework for OIO from which other research in olfactory navigation can build, and we note performance enhancements that can be made to address more complex tasks in the future.
Dynamic real-time multi-UAV cooperative mission planning method under multiple constraints
As UAV popularity soars, so does the mission planning associated with it. The classical approaches suffer from the triple problems of decoupled of task assignment and path planning, poor real-time performance and limited adaptability. Aiming at these challenges, this paper proposes a dynamic real-time multi-UAV collaborative mission planning algorithm based on Dubins paths under a distributed formation structure. Dubins path with multiple advantages bridges the gap between task assignment and path planning, leading to a coupled solution for mission planning. Then, a series of acceleration techniques, task clustering preprocessing, highly efficient distance cost functions, low-complexity and less iterative task allocation strategies, are employed to guarantee the real-time performance of the algorithms. To cope with different emergencies and their simultaneous extremes, real-time planning of emerging tasks and mission replanning due to the reduction of available UAVs are appropriately handled. Finally, the developed algorithm is comprehensively exemplified and studied through simulations, highlighting that the proposed method only sacrifices 9.57% of the path length, while achieving a speed improvement of 4-5 orders of magnitude over the simulated annealing method, with a single mission planning of about 0.0003s.
Beacon-Based Feedback Control for Parking an Active-Joint Center-Articulated Mobile Robot CEC
This paper presents an autonomous parking control strategy for an active-joint center-articulated mobile robot. We first derive a kinematic model of the robot, then propose a control law to stabilize the vehicle's configuration within a small neighborhood of the goal. The control law, designed using Lyapunov techniques, is based on the robot's polar coordinate equations. A beacon-based guidance system provides feedback on the target's position and orientation. Simulations demonstrate the robot's ability to park successfully from arbitrary initial poses.
comment: IEEE Conference - CCECE 2010
Probabilistic Net Load Forecasting for High-Penetration RES Grids Utilizing Enhanced Conditional Diffusion Model
The proliferation of intermittent distributed renewable energy sources (RES) in modern power systems has fundamentally compromised the reliability and accuracy of deterministic net load forecasting. Generative models, particularly diffusion models, demonstrate exceptional potential in uncertainty quantification for scenario forecasting. Nevertheless, their probabilistic predictive capabilities and conditional bootstrapping mechanisms still remain underexplored. In this paper, a day-ahead probabilistic net load forecasting framework is developed by systematically quantifying epistemic uncertainty and aleatoric variability using the feature-informed enhanced conditional diffusion model (ECDM). The ECDM architecture implements the net load distribution generation process using an imputation-based conditional diffusion model, where multi-modal conditional inputs, such as weather and calendar data, are fused via cross-attention mechanisms. Specifically, historical net load profiles are utilized to guide the reverse diffusion trajectory through non-parametric imputation operators preserving spatial-temporal integrity. To capture periodic characteristics, a novel weekly arrangement method is also introduced, while an unconditional model is integrated to ensure diversity in the generated scenarios. Subsequently, the maximum probabilistic points and probability intervals of predicted net load are obtained by the adaptive kernel density estimation under RES intermittency. Moreover, ECDM is extented to multi-energy forecast framework, attempting to increase interpretability of the net load predictions. Numerical experiments on a publicly available dataset demonstrate the superior forecasting performance of the proposed method compared to existing state-of-the-art approaches.
Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization
In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors, and are often pushed (through e.g. policy entropy regularisation) to randomise their actions in favor of exploration. This often makes it challenging for other agents and humans to predict an agent's behavior, triggering unsafe scenarios (e.g. in human-robot interaction). We propose a novel method to induce predictable behavior in RL agents, termed Predictability-Aware RL (PARL), employing the agent's trajectory entropy rate to quantify predictability. Our method maximizes a linear combination of a standard discounted reward and the negative entropy rate, thus trading off optimality with predictability. We show how the entropy rate can be formally cast as an average reward, how entropy-rate value functions can be estimated from a learned model and incorporate this in policy-gradient algorithms, and demonstrate how this approach produces predictable (near-optimal) policies in tasks inspired by human-robot use-cases.
Low-Rank Matrix Regression via Least-Angle Regression
Low-rank matrix regression is a fundamental problem in data science with various applications in systems and control. Nuclear norm regularization has been widely applied to solve this problem due to its convexity. However, it suffers from high computational complexity and the inability to directly specify the rank. This work introduces a novel framework for low-rank matrix regression that addresses both unstructured and Hankel matrices. By decomposing the low-rank matrix into rank-1 bases, the problem is reformulated as an infinite-dimensional sparse learning problem. The least-angle regression (LAR) algorithm is then employed to solve this problem efficiently. For unstructured matrices, a closed-form LAR solution is derived with equivalence to a normalized nuclear norm regularization problem. For Hankel matrices, a real-valued polynomial basis reformulation enables effective LAR implementation. Two numerical examples in network modeling and system realization demonstrate that the proposed approach significantly outperforms the nuclear norm method in terms of estimation accuracy and computational efficiency.
Data-assimilated model-informed reinforcement learning
The control of spatio-temporally chaos is challenging because of high dimensionality and unpredictability. Model-free reinforcement learning (RL) discovers optimal control policies by interacting with the system, typically requiring observations of the full physical state. In practice, sensors often provide only partial and noisy measurements (observations) of the system. The objective of this paper is to develop a framework that enables the control of chaotic systems with partial and noisy observability. The proposed method, data-assimilated model-informed reinforcement learning (DA-MIRL), integrates (i) low-order models to approximate high-dimensional dynamics; (ii) sequential data assimilation to correct the model prediction when observations become available; and (iii) an off-policy actor-critic RL algorithm to adaptively learn an optimal control strategy based on the corrected state estimates. We test DA-MIRL on the spatiotemporally chaotic solutions of the Kuramoto-Sivashinsky equation. We estimate the full state of the environment with (i) a physics-based model, here, a coarse-grained model; and (ii) a data-driven model, here, the control-aware echo state network, which is proposed in this paper. We show that DA-MIRL successfully estimates and suppresses the chaotic dynamics of the environment in real time from partial observations and approximate models. This work opens opportunities for the control of partially observable chaotic systems.
HBS -- Hardware Build System: A Tcl-based, minimal common abstraction approach for build system for hardware designs
Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilanguage. The paper proposes a new build system for the hardware description domain. The system is called the Hardware Build System (HBS). The main goals of the system include simplicity, readability, a minimal number of dependencies, and ease of integration with the existing Electronic Design Automation (EDA) tools. The system proposes a novel, minimal common abstraction approach, whose particular implications are described in the article. All the core functionalities are implemented in Tcl. Only the EDA tool's independent features, such as dependency graph generation, are implemented in a Python wrapper.
Joint Learning of Linear Dynamical Systems under Smoothness Constraints
We consider the problem of joint learning of multiple linear dynamical systems. This has received significant attention recently under different types of assumptions on the model parameters. The setting we consider involves a collection of $m$ linear systems, each of which resides on a node of a given undirected graph $G = ([m], \mathcal{E})$. We assume that the system matrices are marginally stable, and satisfy a smoothness constraint w.r.t $G$ -- akin to the quadratic variation of a signal on a graph. Given access to the states of the nodes over $T$ time points, we then propose two estimators for joint estimation of the system matrices, along with non-asymptotic error bounds on the mean-squared error (MSE). In particular, we show conditions under which the MSE converges to zero as $m$ increases, typically polynomially fast w.r.t $m$. The results hold under mild (i.e., $T \sim \log m$), or sometimes, even no assumption on $T$ (i.e. $T \geq 2$).
comment: 37 pages, 1 figure, corrected typos and made minor edits throughout, added remarks after Corollaries 1,2 and 3
Back to Base: Towards Hands-Off Learning via Safe Resets with Reach-Avoid Safety Filters
Designing controllers that accomplish tasks while guaranteeing safety constraints remains a significant challenge. We often want an agent to perform well in a nominal task, such as environment exploration, while ensuring it can avoid unsafe states and return to a desired target by a specific time. In particular we are motivated by the setting of safe, efficient, hands-off training for reinforcement learning in the real world. By enabling a robot to safely and autonomously reset to a desired region (e.g., charging stations) without human intervention, we can enhance efficiency and facilitate training. Safety filters, such as those based on control barrier functions, decouple safety from nominal control objectives and rigorously guarantee safety. Despite their success, constructing these functions for general nonlinear systems with control constraints and system uncertainties remains an open problem. This paper introduces a safety filter obtained from the value function associated with the reach-avoid problem. The proposed safety filter minimally modifies the nominal controller while avoiding unsafe regions and guiding the system back to the desired target set. By preserving policy performance while allowing safe resetting, we enable efficient hands-off reinforcement learning and advance the feasibility of safe training for real world robots. We demonstrate our approach using a modified version of soft actor-critic to safely train a swing-up task on a modified cartpole stabilization problem.
comment: The first three authors contributed equally to the work
Systems and Control (EESS)
Backpressure-based Mean-field Type Game for Scheduling in Multi-Hop Wireless Sensor Networks
We propose a Mean-Field Type Game (MFTG) framework for effective scheduling in multi-hop wireless sensor networks (WSNs) using backpressure as a performance criterion. Traditional backpressure algorithms leverage queue differentials to regulate data flow and maintain network stability. In this work, we extend the backpressure framework by incorporating a mean-field term into the cost functional, capturing the global behavior of the system alongside local dynamics. The resulting model utilizes the strengths of non-cooperative mean-field type games, enabling nodes to make decentralized decisions based on both individual queue states and system mean-field effects while accounting for stochastic network interactions. By leveraging the interplay between backpressure dynamics and mean-field coupling, the approach balances local optimization with global efficiency. Numerical simulations demonstrate the efficacy of the proposed method in handling congestion and scheduling in large-scale WSNs.
comment: Accepted to the 33rd European Signal Processing Conference (EUSIPCO 2025)
Computation- and Communication-Efficient Online FL for Resource-Constrained Aerial Vehicles
Privacy-preserving distributed machine learning (ML) and aerial connected vehicle (ACV)-assisted edge computing have drawn significant attention lately. Since the onboard sensors of ACVs can capture new data as they move along their trajectories, the continual arrival of such 'newly' sensed data leads to online learning and demands carefully crafting the trajectories. Besides, as typical ACVs are inherently resource-constrained, computation- and communication-efficient ML solutions are needed. Therefore, we propose a computation- and communication-efficient online aerial federated learning (2CEOAFL) algorithm to take the benefits of continual sensed data and limited onboard resources of the ACVs. In particular, considering independently owned ACVs act as selfish data collectors, we first model their trajectories according to their respective time-varying data distributions. We then propose a 2CEOAFL algorithm that allows the flying ACVs to (a) prune the received dense ML model to make it shallow, (b) train the pruned model, and (c) probabilistically quantize and offload their trained accumulated gradients to the central server (CS). Our extensive simulation results show that the proposed 2CEOAFL algorithm delivers comparable performances to its non-pruned and nonquantized, hence, computation- and communication-inefficient counterparts.
CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge ATC 2025
Deploying large language models (LLMs) on edge devices is crucial for delivering fast responses and ensuring data privacy. However, the limited storage, weight, and power of edge devices make it difficult to deploy LLM-powered applications. These devices must balance latency requirements with energy consumption and model accuracy. In this paper, we first quantify the challenges of deploying LLMs on off-the-shelf edge devices and then we present CLONE, an in-depth algorithm-hardware co-design at both the model- and system-level that intelligently integrates real-time, energy optimization while maintaining robust generality. In order to maximize the synergistic benefits of these algorithms in always-on and intermediate edge computing settings, we specialize in a 28nm scalable hardware accelerator system. We implement and extensively evaluate CLONE on two off-the-shelf edge platforms. Experiments show that CLONE effectively accelerates the inference process up to 11.92x, and saves energy up to 7.36x, while maintaining high-generation.
comment: Accepted by USENIX ATC 2025
Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD($\lambda$) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.
On dual-rate consensus under transmission delays
In this paper, we investigate the problem of dual-rate consensus under transmission delays, where the control updates happen at a faster rate than the measurements being received. We assume that the measurements are delayed by a fixed delay and show that for all delays and rates, the system reaches a consensus if and only if the communication graph of the agents is connected and the control gain is chosen in a specific interval. Based on these results we dive deeper into the convergence properties and investigate how the convergence changes when we change the rate for sending measurements. We observe that in certain cases there exists a sweet spot for choosing the sampling rate of the measurements, which can improve the convergence to the consensus point. We then formulate an optimization problem to find a sampling rate to improve the convergence speed and provide a necessary and sufficient condition for the existence of a finite optimizer of this problem. Our results are verified with numerical simulations.
comment: To be presented at the 2025 European Control Conference, 9 pages, 3 figures
Target Sensing Performance in Disaster-Specific ISAC Networks
As sixth-generation (6G) wireless technology emerges, integrated sensing and communication (ISAC) networks offer significant potential for enhancing real-time monitoring in disaster areas. However, existing ISAC approaches often fail to address the unique challenges of dynamic and cluttered disaster areas, resulting in limited sensing coverage and interruptions in sensing service. To address these limitations, this work proposes a mobile ISAC network specifically designed for disaster scenarios. By leveraging stochastic geometry, we derive closed-form expressions for sensing coverage and introduce a novel performance metric to evaluate sensing service continuity. Simulation results validate the analytical derivations and offer key insights into network design.
Quantized Dissipative Uncertain Model for Fractional T_S Fuzzy systems with Time_Varying Delays Under Networked Control System
This paper addressed with the quantized dissipative uncertain problem for delayed fractional T_S Fuzzy system for event_triggered networked systems (E_NS), where the extended dissipativity analysis combines the H infinity, dissipativity, L2 and L infinity and passivity performance in a unified frame. To attain the high efficiency for available channel resources, measurement size decrease mechanism and event_triggered scheme (ETS) are proposed. Firstly, we present the ETS in which signal is transmitted through the channel with logical function then logarithmic quantization methodology is implemented for size reduction. Then, we transfer the original delayed fractional T_S fuzzy systems with the effect of quantization under ETS as induced communications delays. Furthermore, by employing the associative Lyapunov functional method in terms of linear matrix inequalities, adequate conditions for asymptotical stability is given. Moreover, we also construct the design fuzzy model for state space filtering system. At last, a truck_trailer model is given to show the effectiveness of the proposed strategy.
Geometric Visual Servo Via Optimal Transport
When developing control laws for robotic systems, the principle factor when examining their performance is choosing inputs that allow smooth tracking to a reference input. In the context of robotic manipulation, this involves translating an object or end-effector from an initial pose to a target pose. Robotic manipulation control laws frequently use vision systems as an error generator to track features and produce control inputs. However, current control algorithms don't take into account the probabilistic features that are extracted and instead rely on hand-tuned feature extraction methods. Furthermore, the target features can exist in a static pose thus allowing a combined pose and feature error for control generation. We present a geometric control law for the visual servoing problem for robotic manipulators. The input from the camera constitutes a probability measure on the 3-dimensional Special Euclidean task-space group, where the Wasserstein distance between the current and desired poses is analogous with the geometric geodesic. From this, we develop a controller that allows for both pose and image-based visual servoing by combining classical PD control with gravity compensation with error minimization through the use of geodesic flows on a 3-dimensional Special Euclidean group. We present our results on a set of test cases demonstrating the generalisation ability of our approach to a variety of initial positions.
comment: 19 pages, 5 figures
Recursive Privacy-Preserving Estimation Over Markov Fading Channels
In industrial applications, the presence of moving machinery, vehicles, and personnel, contributes to the dynamic nature of the wireless channel. This time variability induces channel fading, which can be effectively modeled using a Markov fading channel (MFC). In this paper, we investigate the problem of secure state estimation for systems that communicate over a MFC in the presence of an eavesdropper. The objective is to enable a remote authorized user to accurately estimate the states of a dynamic system, while considering the potential interception of the sensor's packet through a wiretap channel. To prevent information leakage, a novel co-design strategy is established, which combines a privacy-preserving mechanism with a state estimator. To implement our encoding scheme, a nonlinear mapping of the innovation is introduced based on the weighted reconstructed innovation previously received by the legitimate user. Corresponding to this encoding scheme, we design a recursive privacy-preserving filtering algorithm to achieve accurate estimation. The boundedness of estimation error dynamics at the legitimate user's side is discussed and the divergence of the eavesdropper's estimation error is analyzed, which demonstrates the effectiveness of our co-design strategy in ensuring secrecy. Furthermore, a simulation example of a three-tank system is provided to demonstrate the effectiveness and feasibility of our privacy-preserving estimation method.
comment: 12 pages, 5 figures
Unit Commitment with Cost-Oriented Temporal Resolution
Time-adaptive unit commitment (UC) has recently been investigated to reduce the scheduling costs by flexibly varying the temporal resolution, which is usually determined by clustering the net load patterns. However, there exists a misalignment between cost and net load patterns due to the discrete start-up costs and out-of-merit-order dispatch triggered by ramping and other constraints. The optimal time-adaptive resolution cannot be completely captured by clustering-based method. This paper proposes a cost-oriented method to address this misalignment by a novel bilevel optimization approach that is efficiently solved through a heuristic greedy algorithm. The impact of varying temporal resolution on the final scheduling costs are tested, based on which the temporal resolution is heuristically updated, achieving significant cost reduction without increasing the number of temporal periods. Subsequently, an improved discretized Adam optimization method together with offline warm start and online refinement strategy is proposed to efficiently search for the better temporal resolution configuration. Results show that the proposed cost-oriented UC temporal resolution determination method achieves enhanced cost efficiency.
Olfactory Inertial Odometry: Methodology for Effective Robot Navigation by Scent
Olfactory navigation is one of the most primitive mechanisms of exploration used by organisms. Navigation by machine olfaction (artificial smell) is a very difficult task to both simulate and solve. With this work, we define olfactory inertial odometry (OIO), a framework for using inertial kinematics, and fast-sampling olfaction sensors to enable navigation by scent analogous to visual inertial odometry (VIO). We establish how principles from SLAM and VIO can be extrapolated to olfaction to enable real-world robotic tasks. We demonstrate OIO with three different odour localization algorithms on a real 5-DoF robot arm over an odour-tracking scenario that resembles real applications in agriculture and food quality control. Our results indicate success in establishing a baseline framework for OIO from which other research in olfactory navigation can build, and we note performance enhancements that can be made to address more complex tasks in the future.
Dynamic real-time multi-UAV cooperative mission planning method under multiple constraints
As UAV popularity soars, so does the mission planning associated with it. The classical approaches suffer from the triple problems of decoupled of task assignment and path planning, poor real-time performance and limited adaptability. Aiming at these challenges, this paper proposes a dynamic real-time multi-UAV collaborative mission planning algorithm based on Dubins paths under a distributed formation structure. Dubins path with multiple advantages bridges the gap between task assignment and path planning, leading to a coupled solution for mission planning. Then, a series of acceleration techniques, task clustering preprocessing, highly efficient distance cost functions, low-complexity and less iterative task allocation strategies, are employed to guarantee the real-time performance of the algorithms. To cope with different emergencies and their simultaneous extremes, real-time planning of emerging tasks and mission replanning due to the reduction of available UAVs are appropriately handled. Finally, the developed algorithm is comprehensively exemplified and studied through simulations, highlighting that the proposed method only sacrifices 9.57% of the path length, while achieving a speed improvement of 4-5 orders of magnitude over the simulated annealing method, with a single mission planning of about 0.0003s.
Beacon-Based Feedback Control for Parking an Active-Joint Center-Articulated Mobile Robot CEC
This paper presents an autonomous parking control strategy for an active-joint center-articulated mobile robot. We first derive a kinematic model of the robot, then propose a control law to stabilize the vehicle's configuration within a small neighborhood of the goal. The control law, designed using Lyapunov techniques, is based on the robot's polar coordinate equations. A beacon-based guidance system provides feedback on the target's position and orientation. Simulations demonstrate the robot's ability to park successfully from arbitrary initial poses.
comment: IEEE Conference - CCECE 2010
Probabilistic Net Load Forecasting for High-Penetration RES Grids Utilizing Enhanced Conditional Diffusion Model
The proliferation of intermittent distributed renewable energy sources (RES) in modern power systems has fundamentally compromised the reliability and accuracy of deterministic net load forecasting. Generative models, particularly diffusion models, demonstrate exceptional potential in uncertainty quantification for scenario forecasting. Nevertheless, their probabilistic predictive capabilities and conditional bootstrapping mechanisms still remain underexplored. In this paper, a day-ahead probabilistic net load forecasting framework is developed by systematically quantifying epistemic uncertainty and aleatoric variability using the feature-informed enhanced conditional diffusion model (ECDM). The ECDM architecture implements the net load distribution generation process using an imputation-based conditional diffusion model, where multi-modal conditional inputs, such as weather and calendar data, are fused via cross-attention mechanisms. Specifically, historical net load profiles are utilized to guide the reverse diffusion trajectory through non-parametric imputation operators preserving spatial-temporal integrity. To capture periodic characteristics, a novel weekly arrangement method is also introduced, while an unconditional model is integrated to ensure diversity in the generated scenarios. Subsequently, the maximum probabilistic points and probability intervals of predicted net load are obtained by the adaptive kernel density estimation under RES intermittency. Moreover, ECDM is extented to multi-energy forecast framework, attempting to increase interpretability of the net load predictions. Numerical experiments on a publicly available dataset demonstrate the superior forecasting performance of the proposed method compared to existing state-of-the-art approaches.
Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization
In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors, and are often pushed (through e.g. policy entropy regularisation) to randomise their actions in favor of exploration. This often makes it challenging for other agents and humans to predict an agent's behavior, triggering unsafe scenarios (e.g. in human-robot interaction). We propose a novel method to induce predictable behavior in RL agents, termed Predictability-Aware RL (PARL), employing the agent's trajectory entropy rate to quantify predictability. Our method maximizes a linear combination of a standard discounted reward and the negative entropy rate, thus trading off optimality with predictability. We show how the entropy rate can be formally cast as an average reward, how entropy-rate value functions can be estimated from a learned model and incorporate this in policy-gradient algorithms, and demonstrate how this approach produces predictable (near-optimal) policies in tasks inspired by human-robot use-cases.
Low-Rank Matrix Regression via Least-Angle Regression
Low-rank matrix regression is a fundamental problem in data science with various applications in systems and control. Nuclear norm regularization has been widely applied to solve this problem due to its convexity. However, it suffers from high computational complexity and the inability to directly specify the rank. This work introduces a novel framework for low-rank matrix regression that addresses both unstructured and Hankel matrices. By decomposing the low-rank matrix into rank-1 bases, the problem is reformulated as an infinite-dimensional sparse learning problem. The least-angle regression (LAR) algorithm is then employed to solve this problem efficiently. For unstructured matrices, a closed-form LAR solution is derived with equivalence to a normalized nuclear norm regularization problem. For Hankel matrices, a real-valued polynomial basis reformulation enables effective LAR implementation. Two numerical examples in network modeling and system realization demonstrate that the proposed approach significantly outperforms the nuclear norm method in terms of estimation accuracy and computational efficiency.
Data-assimilated model-informed reinforcement learning
The control of spatio-temporally chaos is challenging because of high dimensionality and unpredictability. Model-free reinforcement learning (RL) discovers optimal control policies by interacting with the system, typically requiring observations of the full physical state. In practice, sensors often provide only partial and noisy measurements (observations) of the system. The objective of this paper is to develop a framework that enables the control of chaotic systems with partial and noisy observability. The proposed method, data-assimilated model-informed reinforcement learning (DA-MIRL), integrates (i) low-order models to approximate high-dimensional dynamics; (ii) sequential data assimilation to correct the model prediction when observations become available; and (iii) an off-policy actor-critic RL algorithm to adaptively learn an optimal control strategy based on the corrected state estimates. We test DA-MIRL on the spatiotemporally chaotic solutions of the Kuramoto-Sivashinsky equation. We estimate the full state of the environment with (i) a physics-based model, here, a coarse-grained model; and (ii) a data-driven model, here, the control-aware echo state network, which is proposed in this paper. We show that DA-MIRL successfully estimates and suppresses the chaotic dynamics of the environment in real time from partial observations and approximate models. This work opens opportunities for the control of partially observable chaotic systems.
HBS -- Hardware Build System: A Tcl-based, minimal common abstraction approach for build system for hardware designs
Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilanguage. The paper proposes a new build system for the hardware description domain. The system is called the Hardware Build System (HBS). The main goals of the system include simplicity, readability, a minimal number of dependencies, and ease of integration with the existing Electronic Design Automation (EDA) tools. The system proposes a novel, minimal common abstraction approach, whose particular implications are described in the article. All the core functionalities are implemented in Tcl. Only the EDA tool's independent features, such as dependency graph generation, are implemented in a Python wrapper.
Joint Learning of Linear Dynamical Systems under Smoothness Constraints
We consider the problem of joint learning of multiple linear dynamical systems. This has received significant attention recently under different types of assumptions on the model parameters. The setting we consider involves a collection of $m$ linear systems, each of which resides on a node of a given undirected graph $G = ([m], \mathcal{E})$. We assume that the system matrices are marginally stable, and satisfy a smoothness constraint w.r.t $G$ -- akin to the quadratic variation of a signal on a graph. Given access to the states of the nodes over $T$ time points, we then propose two estimators for joint estimation of the system matrices, along with non-asymptotic error bounds on the mean-squared error (MSE). In particular, we show conditions under which the MSE converges to zero as $m$ increases, typically polynomially fast w.r.t $m$. The results hold under mild (i.e., $T \sim \log m$), or sometimes, even no assumption on $T$ (i.e. $T \geq 2$).
comment: 37 pages, 1 figure, corrected typos and made minor edits throughout, added remarks after Corollaries 1,2 and 3
Back to Base: Towards Hands-Off Learning via Safe Resets with Reach-Avoid Safety Filters
Designing controllers that accomplish tasks while guaranteeing safety constraints remains a significant challenge. We often want an agent to perform well in a nominal task, such as environment exploration, while ensuring it can avoid unsafe states and return to a desired target by a specific time. In particular we are motivated by the setting of safe, efficient, hands-off training for reinforcement learning in the real world. By enabling a robot to safely and autonomously reset to a desired region (e.g., charging stations) without human intervention, we can enhance efficiency and facilitate training. Safety filters, such as those based on control barrier functions, decouple safety from nominal control objectives and rigorously guarantee safety. Despite their success, constructing these functions for general nonlinear systems with control constraints and system uncertainties remains an open problem. This paper introduces a safety filter obtained from the value function associated with the reach-avoid problem. The proposed safety filter minimally modifies the nominal controller while avoiding unsafe regions and guiding the system back to the desired target set. By preserving policy performance while allowing safe resetting, we enable efficient hands-off reinforcement learning and advance the feasibility of safe training for real world robots. We demonstrate our approach using a modified version of soft actor-critic to safely train a swing-up task on a modified cartpole stabilization problem.
comment: The first three authors contributed equally to the work
Robotics
A Data-Based Architecture for Flight Test without Test Points
The justification for the "test point" derives from the test pilot's obligation to reproduce faithfully the pre-specified conditions of some model prediction. Pilot deviation from those conditions invalidates the model assumptions. Flight test aids have been proposed to increase accuracy on more challenging test points. However, the very existence of databands and tolerances is the problem more fundamental than inadequate pilot skill. We propose a novel approach, which eliminates test points. We start with a high-fidelity digital model of an air vehicle. Instead of using this model to generate a point prediction, we use a machine learning method to produce a reduced-order model (ROM). The ROM has two important properties. First, it can generate a prediction based on any set of conditions the pilot flies. Second, if the test result at those conditions differ from the prediction, the ROM can be updated using the new data. The outcome of flight test is thus a refined ROM at whatever conditions were flown. This ROM in turn updates and validates the high-fidelity model. We present a single example of this "point-less" architecture, using T-38C flight test data. We first use a generic aircraft model to build a ROM of longitudinal pitching motion as a hypersurface. We then ingest unconstrained flight test data and use Gaussian Process Regression to update and condition the hypersurface. By proposing a second-order equivalent system for the T-38C, this hypersurface then generates parameters necessary to assess MIL-STD-1797B compliance for longitudinal dynamics.
comment: The Society of Experimental Test Pilots Annual Symposium, vol. 68th, 2024
Efficient Manipulation-Enhanced Semantic Mapping With Uncertainty-Informed Action Selection
Service robots operating in cluttered human environments such as homes, offices, and schools cannot rely on predefined object arrangements and must continuously update their semantic and spatial estimates while dealing with possible frequent rearrangements. Efficient and accurate mapping under such conditions demands selecting informative viewpoints and targeted manipulations to reduce occlusions and uncertainty. In this work, we present a manipulation-enhanced semantic mapping framework for occlusion-heavy shelf scenes that integrates evidential metric-semantic mapping with reinforcement-learning-based next-best view planning and targeted action selection. Our method thereby exploits uncertainty estimates from the Dirichlet and Beta distributions in the semantic and occupancy prediction networks to guide both active sensor placement and object manipulation, focusing on areas of limited knowledge and selecting actions with high expected information gain. For object manipulation, we introduce an uncertainty-informed push strategy that targets occlusion-critical objects and generates minimally invasive actions to reveal hidden regions. The experimental evaluation shows that our framework highly reduces object displacement and drops while achieving a 95% reduction in planning time compared to the state-of-the-art, thereby realizing real-world applicability.
Active inference as a unified model of collision avoidance behavior in human drivers
Collision avoidance -- involving a rapid threat detection and quick execution of the appropriate evasive maneuver -- is a critical aspect of driving. However, existing models of human collision avoidance behavior are fragmented, focusing on specific scenarios or only describing certain aspects of the avoidance behavior, such as response times. This paper addresses these gaps by proposing a novel computational cognitive model of human collision avoidance behavior based on active inference. Active inference provides a unified approach to modeling human behavior: the minimization of free energy. Building on prior active inference work, our model incorporates established cognitive mechanisms such as evidence accumulation to simulate human responses in two distinct collision avoidance scenarios: front-to-rear lead vehicle braking and lateral incursion by an oncoming vehicle. We demonstrate that our model explains a wide range of previous empirical findings on human collision avoidance behavior. Specifically, the model closely reproduces both aggregate results from meta-analyses previously reported in the literature and detailed, scenario-specific effects observed in a recent driving simulator study, including response timing, maneuver selection, and execution. Our results highlight the potential of active inference as a unified framework for understanding and modeling human behavior in complex real-life driving tasks.
Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation
Safe and real-time navigation is fundamental for humanoid robot applications. However, existing bipedal robot navigation frameworks often struggle to balance computational efficiency with the precision required for stable locomotion. We propose a novel hierarchical framework that continuously generates dynamic subgoals to guide the robot through cluttered environments. Our method comprises a high-level reinforcement learning (RL) planner for subgoal selection in a robot-centric coordinate system and a low-level Model Predictive Control (MPC) based planner which produces robust walking gaits to reach these subgoals. To expedite and stabilize the training process, we incorporate a data bootstrapping technique that leverages a model-based navigation approach to generate a diverse, informative dataset. We validate our method in simulation using the Agility Robotics Digit humanoid across multiple scenarios with random obstacles. Results show that our framework significantly improves navigation success rates and adaptability compared to both the original model-based method and other learning-based methods.
comment: 8 pages, 5 figures, 3 tables
LoL-NMPC: Low-Level Dynamics Integration in Nonlinear Model Predictive Control for Unmanned Aerial Vehicles
In this paper, we address the problem of tracking high-speed agile trajectories for Unmanned Aerial Vehicles(UAVs), where model inaccuracies can lead to large tracking errors. Existing Nonlinear Model Predictive Controller(NMPC) methods typically neglect the dynamics of the low-level flight controllers such as underlying PID controller present in many flight stacks, and this results in sub-optimal tracking performance at high speeds and accelerations. To this end, we propose a novel NMPC formulation, LoL-NMPC, which explicitly incorporates low-level controller dynamics and motor dynamics in order to minimize trajectory tracking errors while maintaining computational efficiency. By leveraging linear constraints inside low-level dynamics, our approach inherently accounts for actuator constraints without requiring additional reallocation strategies. The proposed method is validated in both simulation and real-world experiments, demonstrating improved tracking accuracy and robustness at speeds up to 98.57 km/h and accelerations of 3.5 g. Our results show an average 21.97 % reduction in trajectory tracking error over standard NMPC formulation, with LoL-NMPC maintaining real-time feasibility at 100 Hz on an embedded ARM-based flight computer.
comment: Submitted Version
Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning
Generalized policy and execution efficiency constitute the two critical challenges in robotic manipulation. While recent foundation policies benefit from the common-sense reasoning capabilities of internet-scale pretrained vision-language models (VLMs), they often suffer from low execution frequency. To mitigate this dilemma, dual-system approaches, inspired by Kahneman's theory, have been proposed to leverage a VLM-based System 2 model handling high-level reasoning and a separate System 1 action model ensuring real-time control. However, existing designs maintain both systems as separate models, limiting System 1 from fully leveraging the rich pretrained knowledge from the VLM-based System 2. In this work, we propose Fast-in-Slow (FiS), a unified dual-system vision-language-action (VLA) model that embeds the System 1 execution module within the VLM-based System 2 by partially sharing parameters. This innovative paradigm not only enables high-frequency execution in System 1 but also facilitates coordination between the reasoning and execution components within a single foundation model of System 2. Given their fundamentally distinct roles within FiS-VLA, we design the two systems to incorporate heterogeneous modality inputs alongside asynchronous operating frequencies, enabling both fast and precise manipulation. To enable coordination between the two systems, a dual-aware co-training strategy is proposed that equips System 1 with action generation capabilities while preserving System 2's contextual reasoning representation. For evaluation, FiS-VLA outperforms previous state-of-the-art methods by 8% in simulation and 11% in real-world tasks in terms of average success rate, while achieving a 117.7 Hz control frequency with action chunk set to eight. Project web page: fast-in-slow.github.io.
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
We introduce DualMap, an online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries. Designed for efficient semantic mapping and adaptability to changing environments, DualMap meets the essential requirements for real-world robot navigation applications. Our proposed hybrid segmentation frontend and object-level status check eliminate the costly 3D object merging required by prior methods, enabling efficient online scene mapping. The dual-map representation combines a global abstract map for high-level candidate selection with a local concrete map for precise goal-reaching, effectively managing and updating dynamic changes in the environment. Through extensive experiments in both simulation and real-world scenarios, we demonstrate state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation.
comment: 8 pages, 6 figures. Code: https://github.com/Eku127/DualMap Project page: https://eku127.github.io/DualMap/
Feel the Force: Contact-Driven Learning from Humans
Controlling fine-grained forces during manipulation remains a core challenge in robotics. While robot policies learned from robot-collected data or simulation show promise, they struggle to generalize across the diverse range of real-world interactions. Learning directly from humans offers a scalable solution, enabling demonstrators to perform skills in their natural embodiment and in everyday environments. However, visual demonstrations alone lack the information needed to infer precise contact forces. We present FeelTheForce (FTF): a robot learning system that models human tactile behavior to learn force-sensitive manipulation. Using a tactile glove to measure contact forces and a vision-based model to estimate hand pose, we train a closed-loop policy that continuously predicts the forces needed for manipulation. This policy is re-targeted to a Franka Panda robot with tactile gripper sensors using shared visual and action representations. At execution, a PD controller modulates gripper closure to track predicted forces-enabling precise, force-aware control. Our approach grounds robust low-level force control in scalable human supervision, achieving a 77% success rate across 5 force-sensitive manipulation tasks. Code and videos are available at https://feel-the-force-ftf.github.io.
FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation
Enabling robots with contact-rich manipulation remains a pivotal challenge in robot learning, which is substantially hindered by the data collection gap, including its inefficiency and limited sensor setup. While prior work has explored handheld paradigms, their rod-based mechanical structures remain rigid and unintuitive, providing limited tactile feedback and posing challenges for human operators. Motivated by the dexterity and force feedback of human motion, we propose FreeTacMan, a human-centric and robot-free data collection system for accurate and efficient robot manipulation. Concretely, we design a wearable data collection device with dual visuo-tactile grippers, which can be worn by human fingers for intuitive and natural control. A high-precision optical tracking system is introduced to capture end-effector poses, while synchronizing visual and tactile feedback simultaneously. FreeTacMan achieves multiple improvements in data collection performance compared to prior works, and enables effective policy learning for contact-rich manipulation tasks with the help of the visuo-tactile information. We will release the work to facilitate reproducibility and accelerate research in visuo-tactile manipulation.
Online Competitive Information Gathering for Partially Observable Trajectory Games
Game-theoretic agents must make plans that optimally gather information about their opponents. These problems are modeled by partially observable stochastic games (POSGs), but planning in fully continuous POSGs is intractable without heavy offline computation or assumptions on the order of belief maintained by each player. We formulate a finite history/horizon refinement of POSGs which admits competitive information gathering behavior in trajectory space, and through a series of approximations, we present an online method for computing rational trajectory plans in these games which leverages particle-based estimations of the joint state space and performs stochastic gradient play. We also provide the necessary adjustments required to deploy this method on individual agents. The method is tested in continuous pursuit-evasion and warehouse-pickup scenarios (alongside extensions to $N > 2$ players and to more complex environments with visual and physical obstacles), demonstrating evidence of active information gathering and outperforming passive competitors.
comment: Accepted at RSS 2025
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Vision-language models (VLMs) pretrained on large-scale multimodal datasets encode rich visual and linguistic knowledge, making them a strong foundation for robotics. Rather than training robotic policies from scratch, recent approaches adapt VLMs into vision-language-action (VLA) models that enable natural language-driven perception and control. However, existing VLAs are typically massive--often with billions of parameters--leading to high training costs and limited real-world deployability. Moreover, they rely on academic and industrial datasets, overlooking the growing availability of community-collected data from affordable robotic platforms. In this work, we present SmolVLA, a small, efficient, and community-driven VLA that drastically reduces both training and inference costs, while retaining competitive performance. SmolVLA is designed to be trained on a single GPU and deployed on consumer-grade GPUs or even CPUs. To further improve responsiveness, we introduce an asynchronous inference stack decoupling perception and action prediction from action execution, allowing higher control rates with chunked action generation. Despite its compact size, SmolVLA achieves performance comparable to VLAs that are 10x larger. We evaluate SmolVLA on a range of both simulated as well as real-world robotic benchmarks and release all code, pretrained models, and training data.
comment: 24 pages. Code and assets: https://github.com/huggingface/lerobot
unMORE: Unsupervised Multi-Object Segmentation via Center-Boundary Reasoning ICML 2025
We study the challenging problem of unsupervised multi-object segmentation on single images. Existing methods, which rely on image reconstruction objectives to learn objectness or leverage pretrained image features to group similar pixels, often succeed only in segmenting simple synthetic objects or discovering a limited number of real-world objects. In this paper, we introduce unMORE, a novel two-stage pipeline designed to identify many complex objects in real-world images. The key to our approach involves explicitly learning three levels of carefully defined object-centric representations in the first stage. Subsequently, our multi-object reasoning module utilizes these learned object priors to discover multiple objects in the second stage. Notably, this reasoning module is entirely network-free and does not require human labels. Extensive experiments demonstrate that unMORE significantly outperforms all existing unsupervised methods across 6 real-world benchmark datasets, including the challenging COCO dataset, achieving state-of-the-art object segmentation results. Remarkably, our method excels in crowded images where all baselines collapse.
comment: ICML 2025. Code and data are available at: https://github.com/vLAR-group/unMORE
ADEPT: Adaptive Diffusion Environment for Policy Transfer Sim-to-Real
Model-free reinforcement learning has emerged as a powerful method for developing robust robot control policies capable of navigating through complex and unstructured environments. The effectiveness of these methods hinges on two essential elements: (1) the use of massively parallel physics simulations to expedite policy training, and (2) an environment generator tasked with crafting sufficiently challenging yet attainable environments to facilitate continuous policy improvement. Existing methods of outdoor environment generation often rely on heuristics constrained by a set of parameters, limiting the diversity and realism. In this work, we introduce ADEPT, a novel \textbf{A}daptive \textbf{D}iffusion \textbf{E}nvironment for \textbf{P}olicy \textbf{T}ransfer in the zero-shot sim-to-real fashion that leverages Denoising Diffusion Probabilistic Models to dynamically expand existing training environments by adding more diverse and complex environments adaptive to the current policy. ADEPT guides the diffusion model's generation process through initial noise optimization, blending noise-corrupted environments from existing training environments weighted by the policy's performance in each corresponding environment. By manipulating the noise corruption level, ADEPT seamlessly transitions between generating similar environments for policy fine-tuning and novel ones to expand training diversity. To benchmark ADEPT in off-road navigation, we propose a fast and effective multi-layer map representation for wild environment generation. Our experiments show that the policy trained by ADEPT outperforms both procedural generated and natural environments, along with popular navigation methods.
Learning with pyCub: A New Simulation and Exercise Framework for Humanoid Robotics
We present pyCub, an open-source physics-based simulation of the humanoid robot iCub, along with exercises to teach students the basics of humanoid robotics. Compared to existing iCub similators (iCub SIM, iCub Gazebo), which require C++ code and YARP as middleware, pyCub works without YARP and with Python code. The complete robot with all articulations has been simulated, with two cameras in the eyes and the unique sensitive skin of the iCub comprising 4000 receptors on its body surface. The exercises range from basic control of the robot in velocity, joint, and Cartesian space to more complex tasks like gazing, grasping, or reactive control. The whole framework is written and controlled with Python, thus allowing to be used even by people with small or almost no programming practice. The exercises can be scaled to different difficulty levels. We tested the framework in two runs of a course on humanoid robotics. The simulation, exercises, documentation, Docker images, and example videos are publicly available at https://rustlluk.github.io/pyCub.
comment: Submitted for Humanoids 2025
Provably Safe Reinforcement Learning from Analytic Gradients
Deploying autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research which aims to provide such guarantees using safeguards. These safeguards should be integrated during training to prevent a large sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance and sample efficiency. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them with a state-of-the-art learning algorithm and a differentiable simulation. We evaluate how different safeguards affect policy optimisation using numerical experiments on two classical control tasks. The results demonstrate safeguarded training without compromising performance.
comment: 16 pages, 10 figures
Riemannian Time Warping: Multiple Sequence Alignment in Curved Spaces
Temporal alignment of multiple signals through time warping is crucial in many fields, such as classification within speech recognition or robot motion learning. Almost all related works are limited to data in Euclidean space. Although an attempt was made in 2011 to adapt this concept to unit quaternions, a general extension to Riemannian manifolds remains absent. Given its importance for numerous applications in robotics and beyond, we introduce Riemannian Time Warping~(RTW). This novel approach efficiently aligns multiple signals by considering the geometric structure of the Riemannian manifold in which the data is embedded. Extensive experiments on synthetic and real-world data, including tests with an LBR iiwa robot, demonstrate that RTW consistently outperforms state-of-the-art baselines in both averaging and classification tasks.
A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning
We address the bin packing problem (BPP), which aims to maximize bin utilization when packing a variety of items. The offline problem, where the complete information about the item set and their sizes is known in advance, is proven to be NP-hard. The semi-online and online variants are even more challenging, as full information about incoming items is unavailable. While existing methods have tackled both 2D and 3D BPPs, the 2D BPP remains underexplored in terms of fully maximizing utilization. We propose a hierarchical approach for solving the 2D online and semi-online BPP by combining deep reinforcement learning (RL) with heuristic search. The heuristic search selects which item to pack or unpack, determines the packing order, and chooses the orientation of each item, while the RL agent decides the precise position within the bin. Our method is capable of handling diverse scenarios, including repacking, varying levels of item information, differing numbers of accessible items, and coordination of dual manipulators. Experimental results demonstrate that our approach achieves near-optimal utilization across various practical scenarios, largely due to its repacking capability. In addition, the algorithm is evaluated in a physics-based simulation environment, where execution time is measured to assess its real-world performance.
General agents need world models ICML 2025
Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. We show that this model can be extracted from the agent's policy, and that increasing the agents performance or the complexity of the goals it can achieve requires learning increasingly accurate world models. This has a number of consequences: from developing safe and general agents, to bounding agent capabilities in complex environments, and providing new algorithms for eliciting world models from agents.
comment: Accepted ICML 2025
WoMAP: World Models For Embodied Open-Vocabulary Object Localization
Language-instructed active object localization is a critical challenge for robots, requiring efficient exploration of partially observable environments. However, state-of-the-art approaches either struggle to generalize beyond demonstration datasets (e.g., imitation learning methods) or fail to generate physically grounded actions (e.g., VLMs). To address these limitations, we introduce WoMAP (World Models for Active Perception): a recipe for training open-vocabulary object localization policies that: (i) uses a Gaussian Splatting-based real-to-sim-to-real pipeline for scalable data generation without the need for expert demonstrations, (ii) distills dense rewards signals from open-vocabulary object detectors, and (iii) leverages a latent world model for dynamics and rewards prediction to ground high-level action proposals at inference time. Rigorous simulation and hardware experiments demonstrate WoMAP's superior performance in a broad range of zero-shot object localization tasks, with more than 9x and 2x higher success rates compared to VLM and diffusion policy baselines, respectively. Further, we show that WoMAP achieves strong generalization and sim-to-real transfer on a TidyBot.
FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
Learning effective visuomotor policies for robotic manipulation is challenging, as it requires generating precise actions while maintaining computational efficiency. Existing methods remain unsatisfactory due to inherent limitations in the essential action representation and the basic network architectures. We observe that representing actions in the frequency domain captures the structured nature of motion more effectively: low-frequency components reflect global movement patterns, while high-frequency components encode fine local details. Additionally, robotic manipulation tasks of varying complexity demand different levels of modeling precision across these frequency bands. Motivated by this, we propose a novel paradigm for visuomotor policy learning that progressively models hierarchical frequency components. To further enhance precision, we introduce continuous latent representations that maintain smoothness and continuity in the action space. Extensive experiments across diverse 2D and 3D robotic manipulation benchmarks demonstrate that our approach outperforms existing methods in both accuracy and efficiency, showcasing the potential of a frequency-domain autoregressive framework with continuous tokens for generalized robotic manipulation.
Trajectory First: A Curriculum for Discovering Diverse Policies
Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a curriculum that first explores at the trajectory level before learning step-based policies. In our empirical evaluation, we provide novel insights into the shortcoming of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.
Hierarchical Intention-Aware Expressive Motion Generation for Humanoid Robots
Effective human-robot interaction requires robots to identify human intentions and generate expressive, socially appropriate motions in real-time. Existing approaches often rely on fixed motion libraries or computationally expensive generative models. We propose a hierarchical framework that combines intention-aware reasoning via in-context learning (ICL) with real-time motion generation using diffusion models. Our system introduces structured prompting with confidence scoring, fallback behaviors, and social context awareness to enable intention refinement and adaptive response. Leveraging large-scale motion datasets and efficient latent-space denoising, the framework generates diverse, physically plausible gestures suitable for dynamic humanoid interactions. Experimental validation on a physical platform demonstrates the robustness and social alignment of our method in realistic scenarios.
comment: 7 pages, 2 figures, IEEE conference paper
SEMNAV: A Semantic Segmentation-Driven Approach to Visual Semantic Navigation
Visual Semantic Navigation (VSN) is a fundamental problem in robotics, where an agent must navigate toward a target object in an unknown environment, mainly using visual information. Most state-of-the-art VSN models are trained in simulation environments, where rendered scenes of the real world are used, at best. These approaches typically rely on raw RGB data from the virtual scenes, which limits their ability to generalize to real-world environments due to domain adaptation issues. To tackle this problem, in this work, we propose SEMNAV, a novel approach that leverages semantic segmentation as the main visual input representation of the environment to enhance the agent's perception and decision-making capabilities. By explicitly incorporating high-level semantic information, our model learns robust navigation policies that improve generalization across unseen environments, both in simulated and real world settings. We also introduce a newly curated dataset, i.e. the SEMNAV dataset, designed for training semantic segmentation-aware navigation models like SEMNAV. Our approach is evaluated extensively in both simulated environments and with real-world robotic platforms. Experimental results demonstrate that SEMNAV outperforms existing state-of-the-art VSN models, achieving higher success rates in the Habitat 2.0 simulation environment, using the HM3D dataset. Furthermore, our real-world experiments highlight the effectiveness of semantic segmentation in mitigating the sim-to-real gap, making our model a promising solution for practical VSN-based robotic applications. We release SEMNAV dataset, code and trained models at https://github.com/gramuah/semnav
Captivity-Escape Games as a Means for Safety in Online Motion Generation
This paper presents a method that addresses the conservatism, computational effort, and limited numerical accuracy of existing frameworks and methods that ensure safety in online model-based motion generation, commonly referred to as fast and safe tracking. Computational limitations restrict online motion planning to low-fidelity models. However, planning with low-fidelity models compromises safety, as the dynamic feasibility of resulting reference trajectories is not ensured. This potentially leads to unavoidable tracking errors that may cause safety-critical constraint violations. Existing frameworks mitigate this safety risk by augmenting safety-critical constraints in motion planning by a safety margin that prevents constraint violations under worst-case tracking errors. However, the methods employed in these frameworks determine the safety margin based on a heuristically selected performance of the planning model, which likely results in overly conservative reference trajectories. Furthermore, these methods are computationally intensive, and the state-of-the-art method is limited in numerical accuracy. We adopt a different perspective and address these limitations with a method that mitigates conservatism in existing frameworks by adapting the planning model performance to a given safety margin. Our method achieves numerical accuracy and requires significantly less computation time than existing methods by leveraging a captivity-escape game, which is a specific zero-sum differential game formulated in this paper. We demonstrate our method using a numerical example and compare it to the state of the art.
Sparse Imagination for Efficient Visual World Model Planning
World model based planning has significantly improved decision-making in complex environments by enabling agents to simulate future states and make informed choices. However, ensuring the prediction accuracy of world models often demands substantial computational resources, posing a major challenge for real-time applications. This computational burden is particularly restrictive in robotics, where resources are severely constrained. To address this limitation, we propose a Sparse Imagination for Efficient Visual World Model Planning, which enhances computational efficiency by reducing the number of tokens processed during forward prediction. Our method leverages a sparsely trained vision-based world model based on transformers with randomized grouped attention strategy, allowing the model to adaptively adjust the number of tokens processed based on the computational resource. By enabling sparse imagination (rollout), our approach significantly accelerates planning while maintaining high control fidelity. Experimental results demonstrate that sparse imagination preserves task performance while dramatically improving inference efficiency, paving the way for the deployment of world models in real-time decision-making scenarios.
Generating Diverse Challenging Terrains for Legged Robots Using Quality-Diversity Algorithm ICRA 2025
While legged robots have achieved significant advancements in recent years, ensuring the robustness of their controllers on unstructured terrains remains challenging. It requires generating diverse and challenging unstructured terrains to test the robot and discover its vulnerabilities. This topic remains underexplored in the literature. This paper presents a Quality-Diversity framework to generate diverse and challenging terrains that uncover weaknesses in legged robot controllers. Our method, applied to both simulated bipedal and quadruped robots, produces an archive of terrains optimized to challenge the controller in different ways. Quantitative and qualitative analyses show that the generated archive effectively contains terrains that the robots struggled to traverse, presenting different failure modes. Interesting results were observed, including failure cases that were not necessarily expected. Experiments show that the generated terrains can also be used to improve RL-based controllers.
comment: Accepted to IEEE ICRA 2025 (7 pages)
Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion
Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimations of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their framework. In this work, we propose a novel two-stage training framework to jointly synthesize the controller and Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training data sampling strategy and a domain updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $\alpha,\!\beta$-CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume $5 - 1.5\cdot 10^{5}$ times larger compared to the baselines, and our verification on continuous systems can be up to $40-10000$ times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.
Variational Adaptive Noise and Dropout towards Stable Recurrent Neural Networks
This paper proposes a novel stable learning theory for recurrent neural networks (RNNs), so-called variational adaptive noise and dropout (VAND). As stabilizing factors for RNNs, noise and dropout on the internal state of RNNs have been separately confirmed in previous studies. We reinterpret the optimization problem of RNNs as variational inference, showing that noise and dropout can be derived simultaneously by transforming the explicit regularization term arising in the optimization problem into implicit regularization. Their scale and ratio can also be adjusted appropriately to optimize the main objective of RNNs, respectively. In an imitation learning scenario with a mobile manipulator, only VAND is able to imitate sequential and periodic behaviors as instructed. https://youtu.be/UOho3Xr6A2w
comment: 6 pages, 6 figures (accepted in ICDL2025)
AdaWorld: Learning Adaptable World Models with Latent Actions ICML 2025
World models aim to learn action-controlled future prediction and have proven essential for the development of intelligent agents. However, most existing world models rely heavily on substantial action-labeled data and costly training, making it challenging to adapt to novel environments with heterogeneous actions through limited interactions. This limitation can hinder their applicability across broader domains. To overcome this limitation, we propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. The key idea is to incorporate action information during the pretraining of world models. This is achieved by extracting latent actions from videos in a self-supervised manner, capturing the most critical transitions between frames. We then develop an autoregressive world model that conditions on these latent actions. This learning paradigm enables highly adaptable world models, facilitating efficient transfer and learning of new actions even with limited interactions and finetuning. Our comprehensive experiments across multiple environments demonstrate that AdaWorld achieves superior performance in both simulation quality and visual planning.
comment: ICML 2025. Project page: https://adaptable-world-model.github.io/, code: https://github.com/Little-Podi/AdaWorld, model: https://huggingface.co/Little-Podi/AdaWorld
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors CVPR 2025
We present a real-time monocular dense SLAM system designed bottom-up from MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system is robust on in-the-wild video sequences despite making no assumption on a fixed or parametric camera model beyond a unique camera centre. We introduce efficient methods for pointmap matching, camera tracking and local fusion, graph construction and loop closure, and second-order global optimisation. With known calibration, a simple modification to the system achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.
comment: CVPR 2025 Highlight. The first two authors contributed equally to this work. Project Page: https://edexheim.github.io/mast3r-slam/
Depth-Constrained ASV Navigation with Deep RL and Limited Sensing
Autonomous Surface Vehicles (ASVs) play a crucial role in maritime operations, yet their navigation in shallow-water environments remains challenging due to dynamic disturbances and depth constraints. Traditional navigation strategies struggle with limited sensor information, making safe and efficient operation difficult. In this paper, we propose a reinforcement learning (RL) framework for ASV navigation under depth constraints, where the vehicle must reach a target while avoiding unsafe areas with only a single depth measurement per timestep from a downward-facing Single Beam Echosounder (SBES). To enhance environmental awareness, we integrate Gaussian Process (GP) regression into the RL framework, enabling the agent to progressively estimate a bathymetric depth map from sparse sonar readings. This approach improves decision-making by providing a richer representation of the environment. Furthermore, we demonstrate effective sim-to-real transfer, ensuring that trained policies generalize well to real-world aquatic conditions. Experimental results validate our method's capability to improve ASV navigation performance while maintaining safety in challenging shallow-water environments.
comment: 8 pages, 8 figures
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i.e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal. Despite the great potential of the end-to-end paradigm, existing methods suffer from several aspects including expensive BEV (bird's eye view) computation, action diversity, and sub-optimal decision in complex real-world scenarios. To address these challenges, we propose a novel hybrid sparse-dense diffusion policy, empowered by a Vision-Language Model (VLM), called Diff-VLA. We explore the sparse diffusion representation for efficient multi-modal driving behavior. Moreover, we rethink the effectiveness of VLM driving decision and improve the trajectory generation guidance through deep interaction across agent, map instances and VLM output. Our method shows superior performance in Autonomous Grand Challenge 2025 which contains challenging real and reactive synthetic scenarios. Our methods achieves 45.0 PDMS.
comment: 4pages
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image CVPR 2025
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks and introduces a new benchmark dataset. While recent methods estimate part poses and sizes at the category level, they often rely on geometric cues and complex multi-stage pipelines that first segment parts from the point cloud, followed by Normalized Part Coordinate Space (NPCS) estimation for 6D poses. These approaches overlook dense semantic cues from RGB images, leading to suboptimal accuracy, particularly for objects with small parts. To address these limitations, we propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts. This method combines RGB-D features to generate instance segmentation and NPCS representations for each part in an end-to-end manner. CAP-Net uses a unified network to simultaneously predict point-wise class labels, centroid offsets, and NPCS maps. A clustering algorithm then groups points of the same predicted class based on their estimated centroid distances to isolate each part. Finally, the NPCS region of each part is aligned with the point cloud to recover its final pose and size. To bridge the sim-to-real domain gap, we introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring photorealistic RGB images and depth noise simulated from real sensors. Experimental evaluations on the RGBD-Art dataset demonstrate that our method significantly outperforms the state-of-the-art approach. Real-world deployments of our model in robotic tasks underscore its robustness and exceptional sim-to-real transfer capabilities, confirming its substantial practical utility. Our dataset, code and pre-trained models are available on the project page.
comment: To appear in CVPR 2025 (Highlight)
CHEQ-ing the Box: Safe Variable Impedance Learning for Robotic Polishing
Robotic systems are increasingly employed for industrial automation, with contact-rich tasks like polishing requiring dexterity and compliant behaviour. These tasks are difficult to model, making classical control challenging. Deep reinforcement learning (RL) offers a promising solution by enabling the learning of models and control policies directly from data. However, its application to real-world problems is limited by data inefficiency and unsafe exploration. Adaptive hybrid RL methods blend classical control and RL adaptively, combining the strengths of both: structure from control and learning from RL. This has led to improvements in data efficiency and exploration safety. However, their potential for hardware applications remains underexplored, with no evaluations on physical systems to date. Such evaluations are critical to fully assess the practicality and effectiveness of these methods in real-world settings. This work presents an experimental demonstration of the hybrid RL algorithm CHEQ for robotic polishing with variable impedance, a task requiring precise force and velocity tracking. In simulation, we show that variable impedance enhances polishing performance. We compare standalone RL with adaptive hybrid RL, demonstrating that CHEQ achieves effective learning while adhering to safety constraints. On hardware, CHEQ achieves effective polishing behaviour, requiring only eight hours of training and incurring just five failures. These results highlight the potential of adaptive hybrid RL for real-world, contact-rich tasks trained directly on hardware.
POPGym Arcade: Parallel Pixelated POMDPs
We present the POPGym Arcade, a collection of hardware-accelerated, pixel-based environments with shared observation and action spaces. Each environment includes fully and partially observable variants, enabling counterfactual studies on partial observability. We also introduce mathematical tools for analyzing policies under partial observability, which reveal how agents recall past information to make decisions. Our analysis shows (1) that controlling for partial observability is critical and (2) that agents with long-term memory learn brittle policies that struggle to generalize. Finally, we demonstrate that recurrent policies can be "poisoned" by old, out-of-distribution observations, with implications for sim-to-real transfer, imitation learning, and offline reinforcement learning.
Autonomous Robotic Radio Source Localization via a Novel Gaussian Mixture Filtering Approach
This study proposes a new Gaussian Mixture Filter (GMF) to improve the estimation performance for the autonomous robotic radio signal source search and localization problem in unknown environments. The proposed filter is first tested with a benchmark numerical problem to validate the performance with other state-of-the-practice approaches such as Particle Filter (PF) and Particle Gaussian Mixture (PGM) filters. Then the proposed approach is tested and compared against PF and PGM filters in real-world robotic field experiments to validate its impact for real-world applications. The considered real-world scenarios have partial observability with the range-only measurement and uncertainty with the measurement model. The results show that the proposed filter can handle this partial observability effectively whilst showing improved performance compared to PF, reducing the computation requirements while demonstrating improved robustness over compared techniques.
Direct Kinematics, Inverse Kinematics, and Motion Planning of 1-DoF Rational Linkages
This study presents a set of algorithms that deal with trajectory planning of rational single-loop mechanisms with one degree of freedom (DoF). Benefiting from a dual quaternion representation of a rational motion, a formula for direct (forward) kinematics, a numerical inverse kinematics algorithm, and the generation of a driving-joint trajectory are provided. A novel approach using the Gauss-Newton search for the one-parameter inverse kinematics problem is presented. Additionally, a method for performing smooth equidistant travel of the tool is provided by applying arc-length reparameterization. This general approach can be applied to one-DoF mechanisms with four to seven joints characterized by a rational motion, without any additional geometrical analysis. An experiment was performed to demonstrate the usage in a laboratory setup.
comment: This is the final published version of the article, available as open access in Mechanism and Machine Theory, Volume 213, 2025. DOI: 10.1016/j.mechmachtheory.2025.106074. Licensed under CC BY license
Agile Decision-Making and Safety-Critical Motion Planning for Emergency Autonomous Vehicles
Efficiency is critical for autonomous vehicles (AVs), especially for emergency AVs. However, most existing methods focus on regular vehicles, overlooking the distinct strategies required by emergency vehicles to address the challenge of maximizing efficiency while ensuring safety. In this paper, we propose an Integrated Agile Decision-Making with Active and Safety-Critical Motion Planning System (IDEAM). IDEAM focuses on enabling emergency AVs, such as ambulances, to actively attain efficiency in dense traffic scenarios with safety in mind. Firstly, the speed-centric decision-making algorithm named the long short-term spatio-temporal graph-centric decision-making (LSGM) is given. LSGM comprises conditional depth-first search (C-DFS) for multiple paths generation as well as methods for speed gains and risk evaluation for path selection, which presents a robust algorithm for high efficiency and safety consideration. Secondly, with an output path from LSGM, the motion planner reconsiders environmental conditions to decide constraints states for the final planning stage, among which the lane-probing state is designed for actively attaining spatial and speed advantage. Thirdly, under the Frenet-based model predictive control (MPC) framework with final constraints state and selected path, the safety-critical motion planner employs decoupled discrete control barrier functions (DCBFs) and linearized discrete-time high-order control barrier functions (DHOCBFs) to model the constraints associated with different driving behaviors, making the optimal optimization problem convex. Finally, we extensively validate our system using scenarios from a randomly synthetic dataset, demonstrating its capability to achieve speed benefits and assure safety simultaneously.
Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Humans practice slow thinking before performing actual actions when handling complex tasks in the physical world. This thinking paradigm, recently, has achieved remarkable advancement in boosting Large Language Models (LLMs) to solve complex tasks in digital domains. However, the potential of slow thinking remains largely unexplored for robotic foundation models interacting with the physical world. In this work, we propose Hume: a dual-system Vision-Language-Action (VLA) model with value-guided System-2 thinking and cascaded action denoising, exploring human-like thinking capabilities of Vision-Language-Action models for dexterous robot control. System 2 of Hume implements value-Guided thinking by extending a Vision-Language-Action Model backbone with a novel value-query head to estimate the state-action value of predicted actions. The value-guided thinking is conducted by repeat sampling multiple action candidates and selecting one according to state-action value. System 1 of Hume is a lightweight reactive visuomotor policy that takes System 2 selected action and performs cascaded action denoising for dexterous robot control. At deployment time, System 2 performs value-guided thinking at a low frequency while System 1 asynchronously receives the System 2 selected action candidate and predicts fluid actions in real time. We show that Hume outperforms the existing state-of-the-art Vision-Language-Action models across multiple simulation benchmark and real-robot deployments.
Human-Machine Interfaces for Subsea Telerobotics: From Soda-straw to Natural Language Interactions
This review explores the evolution of human-machine interfaces (HMIs) in subsea telerobotics, charting the progression from traditional first-person "soda-straw" consoles -- characterized by narrow field-of-view camera feeds -- to contemporary interfaces leveraging gesture recognition, virtual reality, and natural language processing. We systematically analyze the state-of-the-art literature through three interrelated perspectives: operator experience (including immersive feedback, cognitive workload, and ergonomic design), robotic autonomy (contextual understanding and task execution), and the quality of bidirectional communication between human and machine. Emphasis is placed on interface features to highlight persistent limitations in current systems, notably in immersive feedback fidelity, intuitive control mechanisms, and the lack of cross-platform standardization. Additionally, we assess the role of simulators and digital twins as scalable tools for operator training and system prototyping. The review extends beyond classical teleoperation paradigms to examine modern shared autonomy frameworks that facilitate seamless human-robot collaboration. By synthesizing insights from robotics, marine engineering, artificial intelligence, and human factors -- this work provides a comprehensive overview of the current landscape and emerging trajectories in subsea HMI development. Finally, we identify key challenges and open research questions and outline a forward-looking roadmap for advancing intelligent and user-centric HMI technologies in subsea telerobotics.
comment: 29 pages with 22 pages of content
Scalable Multi-Robot Informative Path Planning for Target Mapping via Deep Reinforcement Learning
Autonomous robots are widely utilized for mapping and exploration tasks due to their cost-effectiveness. Multi-robot systems offer scalability and efficiency, especially in terms of the number of robots deployed in more complex environments. These tasks belong to the set of Multi-Robot Informative Path Planning (MRIPP) problems. In this paper, we propose a deep reinforcement learning approach for the MRIPP problem. We aim to maximize the number of discovered stationary targets in an unknown 3D environment while operating under resource constraints (such as path length). Here, each robot aims to maximize discovered targets, avoid unknown static obstacles, and prevent inter-robot collisions while operating under communication and resource constraints. We utilize the centralized training and decentralized execution paradigm to train a single policy neural network. A key aspect of our approach is our coordination graph that prioritizes visiting regions not yet explored by other robots. Our learned policy can be copied onto any number of robots for deployment in more complex environments not seen during training. Our approach outperforms state-of-the-art approaches by at least 26.2% in terms of the number of discovered targets while requiring a planning time of less than 2 sec per step. We present results for more complex environments with up to 64 robots and compare success rates against baseline planners. Our code and trained model are available at - https://github.com/AccGen99/marl_ipp
Four Principles for Physically Interpretable World Models
As autonomous systems are increasingly deployed in open and uncertain settings, there is a growing need for trustworthy world models that can reliably predict future high-dimensional observations. The learned latent representations in world models lack direct mapping to meaningful physical quantities and dynamics, limiting their utility and interpretability in downstream planning, control, and safety verification. In this paper, we argue for a fundamental shift from physically informed to physically interpretable world models - and crystallize four principles that leverage symbolic knowledge to achieve these ends: (1) functionally organizing the latent space according to the physical intent, (2) learning aligned invariant and equivariant representations of the physical world, (3) integrating multiple forms and strengths of supervision into a unified training process, and (4) partitioning generative outputs to support scalability and verifiability. We experimentally demonstrate the value of each principle on two benchmarks. This paper opens several intriguing research directions to achieve and capitalize on full physical interpretability in world models.
comment: Equal contribution by the first two authors
From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning
Humans efficiently generalize from limited demonstrations, but robots still struggle to transfer learned knowledge to complex, unseen tasks with longer horizons and increased complexity. We propose the first known method enabling robots to autonomously invent relational concepts directly from small sets of unannotated, unsegmented demonstrations. The learned symbolic concepts are grounded into logic-based world models, facilitating efficient zero-shot generalization to significantly more complex tasks. Empirical results demonstrate that our approach achieves performance comparable to hand-crafted models, successfully scaling execution horizons and handling up to 18 times more objects than seen in training, providing the first autonomous framework for learning transferable symbolic abstractions from raw robot trajectories.
Steering Elongate Multi-legged Robots By Modulating Body Undulation Waves
Centipedes exhibit great maneuverability in diverse environments due to their many legs and body-driven control. By leveraging similar morphologies and control strategies, their robotic counterparts also demonstrate effective terrestrial locomotion. However, the success of these multi-legged robots is largely limited to forward locomotion; steering is substantially less studied, in part because of the difficulty in coordinating a high degree-of-freedom robot to follow predictable, planar trajectories. To resolve these challenges, we take inspiration from control schemes based on geometric mechanics(GM) in elongate system's locomotion through highly damped environments. We model the elongate, multi-legged system as a ``terrestrial swimmer" in highly frictional environments and implement steering schemes derived from low-order templates of elongate, limbless systems. We identify an effective turning strategy by superimposing two traveling waves of lateral body undulation and further explore variations of the ``turning wave" to enable a spectrum of arc-following steering primitives. We test our hypothesized modulation scheme on a robophysical model and validate steering trajectories against theoretically predicted displacements. We then apply our control framework to Ground Control Robotics' elongate multi-legged robot, Major Tom, using these motion primitives to construct planar motion and in closed-loop control on different terrains. Our work creates a systematic framework for controlling these highly mobile devices in the plane using a low-order model based on sequences of body shape changes.
SeaSplat: Representing Underwater Scenes with 3D Gaussian Splatting and a Physically Grounded Image Formation Model ICRA 2025
We introduce SeaSplat, a method to enable real-time rendering of underwater scenes leveraging recent advances in 3D radiance fields. Underwater scenes are challenging visual environments, as rendering through a medium such as water introduces both range and color dependent effects on image capture. We constrain 3D Gaussian Splatting (3DGS), a recent advance in radiance fields enabling rapid training and real-time rendering of full 3D scenes, with a physically grounded underwater image formation model. Applying SeaSplat to the real-world scenes from SeaThru-NeRF dataset, a scene collected by an underwater vehicle in the US Virgin Islands, and simulation-degraded real-world scenes, not only do we see increased quantitative performance on rendering novel viewpoints from the scene with the medium present, but are also able to recover the underlying true color of the scene and restore renders to be without the presence of the intervening medium. We show that the underwater image formation helps learn scene structure, with better depth maps, as well as show that our improvements maintain the significant computational improvements afforded by leveraging a 3D Gaussian representation.
comment: ICRA 2025. Project page here: https://seasplat.github.io
3D Equivariant Visuomotor Policy Learning via Spherical Projection
Equivariant models have recently been shown to improve the data efficiency of diffusion policy by a significant margin. However, prior work that explored this direction focused primarily on point cloud inputs generated by multiple cameras fixed in the workspace. This type of point cloud input is not compatible with the now-common setting where the primary input modality is an eye-in-hand RGB camera like a GoPro. This paper closes this gap by incorporating into the diffusion policy model a process that projects features from the 2D RGB camera image onto a sphere. This enables us to reason about symmetries in SO(3) without explicitly reconstructing a point cloud. We perform extensive experiments in both simulation and the real world that demonstrate that our method consistently outperforms strong baselines in terms of both performance and sample efficiency. Our work is the first SO(3)-equivariant policy learning framework for robotic manipulation that works using only monocular RGB inputs.
Multiagent Systems
Fairly Wired: Towards Leximin-Optimal Division of Electricity
In many parts of the world - particularly in developing countries - the demand for electricity exceeds the available supply. In such cases, it is impossible to provide electricity to all households simultaneously. This raises a fundamental question: how should electricity be allocated fairly? In this paper, we explore this question through the lens of egalitarianism - a principle that emphasizes equality by prioritizing the welfare of the worst-off households. One natural rule that aligns with this principle is to maximize the egalitarian welfare - the smallest utility across all households. We show that computing such an allocation is NP-hard, even under strong simplifying assumptions. Leximin is a stronger fairness notion that generalizes the egalitarian welfare: it also requires to maximize the smallest utility, but then, subject to that, the second-smallest, then the third, and so on. The hardness results extends directly to leximin as well. Despite this, we present a Fully Polynomial-Time Approximation Scheme (FPTAS) for leximin in the special case where the network connectivity graph is a tree. This means that we can efficiently approximate leximin - and, in particular, the egalitarian welfare - to any desired level of accuracy.
Online Competitive Information Gathering for Partially Observable Trajectory Games
Game-theoretic agents must make plans that optimally gather information about their opponents. These problems are modeled by partially observable stochastic games (POSGs), but planning in fully continuous POSGs is intractable without heavy offline computation or assumptions on the order of belief maintained by each player. We formulate a finite history/horizon refinement of POSGs which admits competitive information gathering behavior in trajectory space, and through a series of approximations, we present an online method for computing rational trajectory plans in these games which leverages particle-based estimations of the joint state space and performs stochastic gradient play. We also provide the necessary adjustments required to deploy this method on individual agents. The method is tested in continuous pursuit-evasion and warehouse-pickup scenarios (alongside extensions to $N > 2$ players and to more complex environments with visual and physical obstacles), demonstrating evidence of active information gathering and outperforming passive competitors.
comment: Accepted at RSS 2025
Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research
As large language models (LLMs) transition from static tools to fully agentic systems, their potential for transforming social science research has become increasingly evident. This paper introduces a structured framework for understanding the diverse applications of LLM-based agents, ranging from simple data processors to complex, multi-agent systems capable of simulating emergent social dynamics. By mapping this developmental continuum across six levels, the paper clarifies the technical and methodological boundaries between different agentic architectures, providing a comprehensive overview of current capabilities and future potential. It highlights how lower-tier systems streamline conventional tasks like text classification and data annotation, while higher-tier systems enable novel forms of inquiry, including the study of group dynamics, norm formation, and large-scale social processes. However, these advancements also introduce significant challenges, including issues of reproducibility, ethical oversight, and the risk of emergent biases. The paper critically examines these concerns, emphasizing the need for robust validation protocols, interdisciplinary collaboration, and standardized evaluation metrics. It argues that while LLM-based agents hold transformative potential for the social sciences, realizing this promise will require careful, context-sensitive deployment and ongoing methodological refinement. The paper concludes with a call for future research that balances technical innovation with ethical responsibility, encouraging the development of agentic systems that not only replicate but also extend the frontiers of social science, offering new insights into the complexities of human behavior.
Agentic AI and Multiagentic: Are We Reinventing the Wheel?
The terms Agentic AI and Multiagentic AI have recently gained popularity in discussions on generative artificial intelligence, often used to describe autonomous software agents and systems composed of such agents. However, the use of these terms confuses these buzzwords with well-established concepts in AI literature: intelligent agents and multi-agent systems. This article offers a critical analysis of this conceptual misuse. We review the theoretical origins of "agentic" in the social sciences (Bandura, 1986) and philosophical notions of intentionality (Dennett, 1971), and then summarise foundational works on intelligent agents and multi-agent systems by Wooldridge, Jennings and others. We examine classic agent architectures, from simple reactive agents to Belief-Desire-Intention (BDI) models, and highlight key properties (autonomy, reactivity, proactivity, social capability) that define agency in AI. We then discuss recent developments in large language models (LLMs) and agent platforms based on LLMs, including the emergence of LLM-powered AI agents and open-source multi-agent orchestration frameworks. We argue that the term AI Agentic is often used as a buzzword for what are essentially AI agents, and AI Multiagentic for what are multi-agent systems. This confusion overlooks decades of research in the field of autonomous agents and multi-agent systems. The article advocates for scientific and technological rigour and the use of established terminology from the state of the art in AI, incorporating the wealth of existing knowledge, including standards for multi-agent system platforms, communication languages and coordination and cooperation algorithms, agreement technologies (automated negotiation, argumentation, virtual organisations, trust, reputation, etc.), into the new and promising wave of LLM-based AI agents, so as not to end up reinventing the wheel.
Quantitative Error Feedback for Quantization Noise Reduction of Filtering over Graphs ICASSP 10
This paper introduces an innovative error feedback framework designed to mitigate quantization noise in distributed graph filtering, where communications are constrained to quantized messages. It comes from error spectrum shaping techniques from state-space digital filters, and therefore establishes connections between quantized filtering processes over different domains. In contrast to existing error compensation methods, our framework quantitatively feeds back the quantization noise for exact compensation. We examine the framework under three key scenarios: (i) deterministic graph filtering, (ii) graph filtering over random graphs, and (iii) graph filtering with random node-asynchronous updates. Rigorous theoretical analysis demonstrates that the proposed framework significantly reduces the effect of quantization noise, and we provide closed-form solutions for the optimal error feedback coefficients. Moreover, this quantitative error feedback mechanism can be seamlessly integrated into communication-efficient decentralized optimization frameworks, enabling lower error floors. Numerical experiments validate the theoretical results, consistently showing that our method outperforms conventional quantization strategies in terms of both accuracy and robustness.
comment: Journal Paper from ICASSP 10.1109/ICASSP49660.2025.10888821
An Empirical Study of Group Conformity in Multi-Agent Systems
Recent advances in Large Language Models (LLMs) have enabled multi-agent systems that simulate real-world interactions with near-human reasoning. While previous studies have extensively examined biases related to protected attributes such as race, the emergence and propagation of biases on socially contentious issues in multi-agent LLM interactions remain underexplored. This study explores how LLM agents shape public opinion through debates on five contentious topics. By simulating over 2,500 debates, we analyze how initially neutral agents, assigned a centrist disposition, adopt specific stances over time. Statistical analyses reveal significant group conformity mirroring human behavior; LLM agents tend to align with numerically dominant groups or more intelligent agents, exerting a greater influence. These findings underscore the crucial role of agent intelligence in shaping discourse and highlight the risks of bias amplification in online interactions. Our results emphasize the need for policy measures that promote diversity and transparency in LLM-generated discussions to mitigate the risks of bias propagation within anonymous online environments.
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extensive failure logs from 127 LLM multi-agent systems with fine-grained annotations linking failures to specific agents and decisive error steps. Using the Who&When, we develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons. The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps, with some methods performing below random. Even SOTA reasoning models, such as OpenAI o1 and DeepSeek R1, fail to achieve practical usability. These results highlight the task's complexity and the need for further research in this area. Code and dataset are available at https://github.com/mingyin1/Agents_Failure_Attribution
comment: camera-ready
Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming
State-of-the-art methods for Human-AI Teaming and Zero-shot Cooperation focus on task completion, i.e., task rewards, as the sole evaluation metric while being agnostic to how the two agents work with each other. Furthermore, subjective user studies only offer limited insight into the quality of cooperation existing within the team. Specifically, we are interested in understanding the cooperative behaviors arising within the team when trained agents are paired with humans -- a problem that has been overlooked by the existing literature. To formally address this problem, we propose the concept of constructive interdependence -- measuring how much agents rely on each other's actions to achieve the shared goal -- as a key metric for evaluating cooperation in human-agent teams. We interpret interdependence in terms of action interactions in a STRIPS formalism, and define metrics that allow us to assess the degree of reliance between the agents' actions. We pair state-of-the-art agents HAT with learned human models as well as human participants in a user study for the popular Overcooked domain, and evaluate the task reward and teaming performance for these human-agent teams. Our results demonstrate that although trained agents attain high task rewards, they fail to induce cooperative behavior, showing very low levels of interdependence across teams. Furthermore, our analysis reveals that teaming performance is not necessarily correlated with task reward, highlighting that task reward alone cannot reliably measure cooperation arising in a team.
CleanAgent: Automating Data Standardization with LLM-based Agents
Data standardization is a crucial part of the data science life cycle. While tools like Pandas offer robust functionalities, their complexity and the manual effort required for customizing code to diverse column types pose significant challenges. Although large language models (LLMs) like ChatGPT have shown promise in automating this process through natural language understanding and code generation, it still demands expert-level programming knowledge and continuous interaction for prompt refinement. To solve these challenges, our key idea is to propose a Python library with declarative, unified APIs for standardizing different column types, simplifying the LLM's code generation with concise API calls. We first propose Dataprep.Clean, a component of the Dataprep Python Library, significantly reduces the coding complexity by enabling the standardization of specific column types with a single line of code. Then, we introduce the CleanAgent framework integrating Dataprep.Clean and LLM-based agents to automate the data standardization process. With CleanAgent, data scientists only need to provide their requirements once, allowing for a hands-free process. To demonstrate the practical utility of CleanAgent, we developed a user-friendly web application, allowing users to interact with it using real-world datasets.
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. In this paper, we present 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through four temperament-based Patient Agents and an Assessor Agent that jointly evaluate diagnostic accuracy and dialogue quality. It includes 3013 cases across 34 diagnoses drawn from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for popular LVLMs, including GPT-4o-mini, LLaVA-3.2-11B-Vision-Instruct, and Qwen2-VL-7B-Instruct. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional network into the LVLM's context boosts F1 by up to 20%. Source code is available at https://anonymous.4open.science/r/3mdbench_acl-0511.
comment: 35 pages, 13 figures, 7 tables
Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models
To balance the quality and inference cost of a Foundation Model (FM, such as large language models (LLMs)) powered software, people often opt to train a routing model that routes requests to FMs with different sizes and capabilities. Existing routing models rely on learning the optimal routing decision from carefully curated data, require complex computations to be updated, and do not consider the potential evolution of weaker FMs. In this paper, we propose Real-time Adaptive Routing (RAR), an approach to continuously adapt FM routing decisions while using guided in-context learning to enhance the capabilities of weaker FM. The goal is to reduce reliance on stronger, more expensive FMs. We evaluate our approach on different subsets of the popular MMLU benchmark. Over time, our approach routes 50.2% fewer requests to computationally expensive models while maintaining around 90.5% of the general response quality. In addition, the guides generated from stronger models have shown intra-domain generalization and led to a better quality of responses compared to an equivalent approach with a standalone weaker FM.
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
As large language models (LLMs) start interacting with each other and generating an increasing amount of text online, it becomes crucial to better understand how information is transformed as it passes from one LLM to the next. While significant research has examined individual LLM behaviors, existing studies have largely overlooked the collective behaviors and information distortions arising from iterated LLM interactions. Small biases, negligible at the single output level, risk being amplified in iterated interactions, potentially leading the content to evolve towards attractor states. In a series of telephone game experiments, we apply a transmission chain design borrowed from the human cultural evolution literature: LLM agents iteratively receive, produce, and transmit texts from the previous to the next agent in the chain. By tracking the evolution of text toxicity, positivity, difficulty, and length across transmission chains, we uncover the existence of biases and attractors, and study their dependence on the initial text, the instructions, language model, and model size. For instance, we find that more open-ended instructions lead to stronger attraction effects compared to more constrained tasks. We also find that different text properties display different sensitivity to attraction effects, with toxicity leading to stronger attractors than length. These findings highlight the importance of accounting for multi-step transmission dynamics and represent a first step towards a more comprehensive understanding of LLM cultural dynamics.
comment: Code available at https://github.com/jeremyperez2/TelephoneGameLLM. Companion website with a Data Explorer tool at https://sites.google.com/view/telephone-game-llm
Simulation of Crowd Egress with Environmental Stressors
This article introduces a modeling framework to characterize evacuee response to environmental stimuli during emergency egress. The model is developed in consistency with stress theory, which explains how an organism reacts to environmental stressors. We integrate the theory into the well-known social-force model, and develop a framework to simulate crowd evacuation behavior in multi-compartment buildings. Our method serves as a theoretical basis to study crowd movement at bottlenecks, and simulate their herding and way-finding behavior in normal and hazardous conditions. The pre-movement behavior is also briefly investigated by using opinion dynamics with social group model. The algorithms have been partly tested in FDS+EVAC as well as our simulation platform crowdEgress.
comment: 30 pages, 19 figures
Systems and Control (CS)
SIL Allocation for Mitigation Safety Functions
SIL (Safety Integrity Level) allocation plays a pivotal role in evaluating the significance of Safety Functions (SFs) within high-risk industries. The outcomes of a SIL allocation study determine the design specifications necessary to uphold the Probability of Failure on Demand (PFD) below permissible limits, thus managing risk effectively. While extensive research has focused on SIL allocation for preventive SFs, there is a noticeable gap in attention towards mitigation SFs. To address this gap, this paper discusses the shortcomings of current methods and proposes a new approach to overcome them. The principles of the proposed method are substantiated by detailed mathematical formulation and the practical application of the method is demonstrated through a case study in a road tunnel project.
Experimental Covert Communication Using Software-Defined Radio
The fundamental information-theoretic limits of covert, or low probability of detection (LPD), communication have been extensively studied for over a decade, resulting in the square root law (SRL): only $L\sqrt{n}$ covert bits can be reliably transmitted over time-bandwidth product $n$, for constant $L>0$. Transmitting more either results in detection or decoding errors. The SRL imposes significant constraints on hardware realization of provably-secure covert communication. Thus, experimental validation of covert communication is underexplored: to date, only two experimental studies of SRL-based covert communication are available, both focusing on optical channels. Here, we report our initial results demonstrating the provably-secure covert radio-frequency (RF) communication using software-defined radios (SDRs). These validate theoretical predictions, open practical avenues for implementing covert communication systems, as well as raise future research questions.
comment: 8 pages, 5 figures
Second-order AAA algorithms for structured data-driven modeling
The data-driven modeling of dynamical systems has become an essential tool for the construction of accurate computational models from real-world data. In this process, the inherent differential structures underlying the considered physical phenomena are often neglected making the reinterpretation of the learned models in a physically meaningful sense very challenging. In this work, we present three data-driven modeling approaches for the construction of dynamical systems with second-order differential structure directly from frequency domain data. Based on the second-order structured barycentric form, we extend the well-known Adaptive Antoulas-Anderson algorithm to the case of second-order systems. Depending on the available computational resources, we propose variations of the proposed method that prioritize either higher computation speed or greater modeling accuracy, and we present a theoretical analysis for the expected accuracy and performance of the proposed methods. Three numerical examples demonstrate the effectiveness of our new structured approaches in comparison to classical unstructured data-driven modeling.
comment: 37 pages, 6 figures, 3 tables
Second-Order-Cone Formulations of Power Flow for Topology Optimization
Optimization problems that involve topology optimization in scenarios with large scale outages, such as post-disaster restoration or public safety power shutoff planning, are very challenging to solve. Using simple power flow representations such as DC power flow or network flow models results in low quality solutions which requires significantly higher-than-predicted load shed to become AC feasible. Recent work has shown that formulations based on the Second Order Cone (SOC) power flow formulation find very high quality solutions with low load shed, but the computational burden of these formulations remains a significant challenge. With the aim of reducing computational time while maintaining high solution quality, this work explores formulations which replace the conic constraints with a small number of linear cuts. The goal of this approach is not to find an exact power flow solution, but rather to identify good binary decisions, where the power flow can be resolved after the binary variables are fixed. We find that a simple reformulation of the Second Order Cone Optimal Power Shutoff problem can greatly improve the solution speed, but that a full linearization of the SOC voltage cone equation results in an overestimation of the amount of power that can be delivered to loads.
Active inference as a unified model of collision avoidance behavior in human drivers
Collision avoidance -- involving a rapid threat detection and quick execution of the appropriate evasive maneuver -- is a critical aspect of driving. However, existing models of human collision avoidance behavior are fragmented, focusing on specific scenarios or only describing certain aspects of the avoidance behavior, such as response times. This paper addresses these gaps by proposing a novel computational cognitive model of human collision avoidance behavior based on active inference. Active inference provides a unified approach to modeling human behavior: the minimization of free energy. Building on prior active inference work, our model incorporates established cognitive mechanisms such as evidence accumulation to simulate human responses in two distinct collision avoidance scenarios: front-to-rear lead vehicle braking and lateral incursion by an oncoming vehicle. We demonstrate that our model explains a wide range of previous empirical findings on human collision avoidance behavior. Specifically, the model closely reproduces both aggregate results from meta-analyses previously reported in the literature and detailed, scenario-specific effects observed in a recent driving simulator study, including response timing, maneuver selection, and execution. Our results highlight the potential of active inference as a unified framework for understanding and modeling human behavior in complex real-life driving tasks.
Bregman Centroid Guided Cross-Entropy Method
The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($\mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $\textit{Bregman centroids}$ for principled information aggregation and diversity control. $\textbf{$\mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $\textbf{$\mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $\textbf{$\mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.
Optimal Coordination of Flexible DERs in Local Energy and Flexibility Markets to Ensure Social Equity
Local electricity markets offer a promising solution for integrating renewable energy sources and other distributed energy resources (DERs) into distribution networks. These markets enable the effective utilization of flexible resources by facilitating coordination among various agents. Beyond technical and economic considerations, addressing social equity within these local communities is critical and requires dedicated attention in market-clearing frameworks. This paper proposes a social equity-based market-clearing framework for the optimal management of DERs' energy and flexibility within local communities. The proposed framework incorporates consumers' energy burden to ensure fair pricing in energy market clearance. Furthermore, to ensure equity during unbalanced operating conditions, flexible resources are managed in the local flexibility market, ensuring that all participants can trade power fairly under network disturbances. The model is formulated as a second-order cone programming (SOCP) optimization and validated on the IEEE 33-bus test distribution network.
comment: Accepted for presenting in IEEE PES General Meeting 2025
Hybrid SIS Dynamics for Demand Modeling of Frequently Updated Products
We propose a hybrid spreading process model to capture the dynamics of demand for software-based products. We introduce discontinuous jumps in the state to model sudden surges in demand that can be seen immediately after a product update is released. After each update, the modeled demand evolves according to a continuous-time susceptible-infected-susceptible (SIS) epidemic model. We identify the necessary and sufficient conditions for estimating the hybrid model's parameters for an arbitrary finite number of sequential updates. We verify the parameter estimation conditions in simulation, and evaluate how the estimation of these parameters is impacted by the presence of observation and process noise. We then validate our model by applying our estimation method to daily user engagement data for a regularly updating software product, the live-service video game `Apex Legends.'
Systematic Hazard Analysis for Frontier AI using STPA
All of the frontier AI companies have published safety frameworks where they define capability thresholds and risk mitigations that determine how they will safely develop and deploy their models. Adoption of systematic approaches to risk modelling, based on established practices used in safety-critical industries, has been recommended, however frontier AI companies currently do not describe in detail any structured approach to identifying and analysing hazards. STPA (Systems-Theoretic Process Analysis) is a systematic methodology for identifying how complex systems can become unsafe, leading to hazards. It achieves this by mapping out controllers and controlled processes then analysing their interactions and feedback loops to understand how harmful outcomes could occur (Leveson & Thomas, 2018). We evaluate STPA's ability to broaden the scope, improve traceability and strengthen the robustness of safety assurance for frontier AI systems. Applying STPA to the threat model and scenario described in 'A Sketch of an AI Control Safety Case' (Korbak et al., 2025), we derive a list of Unsafe Control Actions. From these we select a subset and explore the Loss Scenarios that lead to them if left unmitigated. We find that STPA is able to identify causal factors that may be missed by unstructured hazard analysis methodologies thereby improving robustness. We suggest STPA could increase the safety assurance of frontier AI when used to complement or check coverage of existing AI governance techniques including capability thresholds, model evaluations and emergency procedures. The application of a systematic methodology supports scalability by increasing the proportion of the analysis that could be conducted by LLMs, reducing the burden on human domain experts.
comment: 29 pages, 5 figures, 7 tables
ADEPT: Adaptive Diffusion Environment for Policy Transfer Sim-to-Real
Model-free reinforcement learning has emerged as a powerful method for developing robust robot control policies capable of navigating through complex and unstructured environments. The effectiveness of these methods hinges on two essential elements: (1) the use of massively parallel physics simulations to expedite policy training, and (2) an environment generator tasked with crafting sufficiently challenging yet attainable environments to facilitate continuous policy improvement. Existing methods of outdoor environment generation often rely on heuristics constrained by a set of parameters, limiting the diversity and realism. In this work, we introduce ADEPT, a novel \textbf{A}daptive \textbf{D}iffusion \textbf{E}nvironment for \textbf{P}olicy \textbf{T}ransfer in the zero-shot sim-to-real fashion that leverages Denoising Diffusion Probabilistic Models to dynamically expand existing training environments by adding more diverse and complex environments adaptive to the current policy. ADEPT guides the diffusion model's generation process through initial noise optimization, blending noise-corrupted environments from existing training environments weighted by the policy's performance in each corresponding environment. By manipulating the noise corruption level, ADEPT seamlessly transitions between generating similar environments for policy fine-tuning and novel ones to expand training diversity. To benchmark ADEPT in off-road navigation, we propose a fast and effective multi-layer map representation for wild environment generation. Our experiments show that the policy trained by ADEPT outperforms both procedural generated and natural environments, along with popular navigation methods.
Generalized Super-Twisting Observer for a class of interconnected nonlinear systems with uncertainties
The Generalized Super-Twisting Observer (GSTO) is extended for a strongly observable class of nonlinearly interconnected systems with bounded uncertainties/perturbations. A nonsmooth strong Lyapunov function is used to prove the finite-time convergence of the proposed observer to the true system's trajectories, in the presence of the uncertainties. A case study on the interaction between two food production systems is presented, comparing the proposed observer with the High Gain observer. The results emphasize the critical role of the GSTO's discontinuous term in achieving exact estimation.
Smooth Logic Constraints in Nonlinear Optimization and Optimal Control Problems
In some optimal control problems, complex relationships between states and inputs cannot be easily represented using continuous constraints, necessitating the use of discrete logic instead. This paper presents a method for incorporating such logic constraints directly within continuous optimization frameworks, eliminating the need for binary variables or specialized solvers. Our approach reformulates arbitrary logic constraints under minimal assumptions as max-min constraints, which are then smoothed by introducing auxiliary variables into the optimization problem. When these reformulated constraints are satisfied, they guarantee that the original logical conditions hold, ensuring correctness in the optimization process. We demonstrate the effectiveness of this method on two planar quadrotor control tasks with complex logic constraints. Compared to existing techniques for encoding logic in continuous optimization, our approach achieves faster computational performance and improved convergence to feasible solutions.
comment: 6 pages, 7 figures, submitted to the 2025 IEEE Conference on Decision and Control
Update-Aware Robust Optimal Model Predictive Control for Nonlinear Systems
Robust optimal or min-max model predictive control (MPC) approaches aim to guarantee constraint satisfaction over a known, bounded uncertainty set while minimizing a worst-case performance bound. Traditionally, these methods compute a trajectory that meets the desired properties over a fixed prediction horizon, apply a portion of the resulting input, and then re-solve the MPC problem using newly obtained measurements at the next time step. However, this approach fails to account for the fact that the control trajectory will be updated in the future, potentially leading to conservative designs. In this paper, we present a novel update-aware robust optimal MPC algorithm for decreasing horizon problems on nonlinear systems that explicitly accounts for future control trajectory updates. This additional insight allows our method to provably expand the feasible solution set and guarantee improved worst-case performance bounds compared to existing techniques. Our approach formulates the trajectory generation problem as a sequence of nested existence-constrained semi-infinite programs (SIPs), which can be efficiently solved using local reduction techniques. To demonstrate its effectiveness, we evaluate our approach on a planar quadrotor problem, where it clearly outperforms an equivalent method that does not account for future updates at the cost of increased computation time.
comment: 6 pages, 2 figures, to be published in the IEEE Control System Letters
A Vertical Approach to Designing and Managing Sustainable Heterogeneous Edge Data Centers
The increasing demand for Artificial Intelligence (AI) computing poses significant environmental challenges, with both operational and embodied carbon emissions becoming major contributors. This paper presents a carbon-aware holistic methodology for designing and managing sustainable Edge Data Centers (EDCs), based on three design principles that challenge the state-of-the-art optimization paradigms. Our approach employs vertical integration across the architecture, system, and runtime layers, balances operational and embodied carbon emissions while considering EDC performance as a co-optimization objective, rather than a constraint. At the architecture level, we propose carbon-aware and approximate accelerator designs to reduce embodied carbon. At the system level, we enhance resource utilization and adapt to real-time carbon intensity variations to minimize operational emissions. Finally, at the runtime level, we develop dynamic scheduling frameworks that adjust execution, based on energy constraints and carbon intensity.
comment: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) 2025
Interpretable reinforcement learning for heat pump control through asymmetric differentiable decision trees
In recent years, deep reinforcement learning (DRL) algorithms have gained traction in home energy management systems. However, their adoption by energy management companies remains limited due to the black-box nature of DRL, which fails to provide transparent decision-making feedback. To address this, explainable reinforcement learning (XRL) techniques have emerged, aiming to make DRL decisions more transparent. Among these, soft differential decision tree (DDT) distillation provides a promising approach due to the clear decision rules they are based on, which can be efficiently computed. However, achieving high performance often requires deep, and completely full, trees, which reduces interpretability. To overcome this, we propose a novel asymmetric soft DDT construction method. Unlike traditional soft DDTs, our approach adaptively constructs trees by expanding nodes only when necessary. This improves the efficient use of decision nodes, which require a predetermined depth to construct full symmetric trees, enhancing both interpretability and performance. We demonstrate the potential of asymmetric DDTs to provide transparent, efficient, and high-performing decision-making in home energy management systems.
comment: 7 pages, 3 figures, conference
Equivalence of Left- and Right-Invariant Extended Kalman Filters on Matrix Lie Groups
This paper derives the extended Kalman filter (EKF) for continuous-time systems on matrix Lie groups observed through discrete-time measurements. By modeling the system noise on the Lie algebra and adopting a Stratonovich interpretation for the stochastic differential equation (SDE), we ensure that solutions remain on the manifold. The derivation of the filter follows classical EKF principles, naturally integrating a necessary full-order covariance reset post-measurement update. A key contribution is proving that this full-order covariance reset guarantees that the Lie-group-valued state estimate is invariant to whether a left- or right-invariant error definition is used in the EKF. Monte Carlo simulations of the aided inertial navigation problem validate the invariance property and confirm its absence when employing reduced-order covariance resets.
comment: This work has been submitted to the IEEE for possible publication
A State of the Art on Recent Progress and Emerging Challenges on Energy Transfer Between Vibrating Modes Under an External Mechanical Force With Time-Varying Frequency From 2020 to 2025
In this paper, we discuss an example of current importance with a future perspective in engineering, in which excitation sources always have limited power, limited inertia, and their frequencies vary according to the instantaneous state of the vibrating system. Practical examples of non-ideal systems are considered. The most common phenomenon for this kind of system is discussed. The period considered is from 2020 to 2025. The specific properties of various models are also discussed. Directions for future investigations are provided. In this paper, the authors revisited some publications based on the assumption that the external excitations are produced by non-ideal sources (RNIS), that is, with limited power supply. Among these applications, nonlinear phenomena such as the Sommerfeld effect and saturation phenomenon were observed, considering fractional damping. Energy harvesters and the Jacobi-Anger expansion were used in the governing equations of motion. We also used the Jacobi-Anger expansion in the case of energy transfer between vibrating modes under an external force with time-varying frequency, which represents one of the future directions of research on non-ideal vibrating systems (RNIS).
comment: 6 figures, 35 pages
Optimal Co-Design of a Hybrid Energy Storage System for Truck Charging
The major challenges to battery electric truck adoption are their high cost and grid congestion.In this context, stationary energy storage systems can help mitigate both issues. Since their design and operation are strongly coupled, to make the best out of them, they should be jointly optimized. This paper presents a co-design framework for hybrid energy storage systems where their technology and sizing are optimized jointly with their operational strategies. Specifically, we consider a microgrid supporting truck chargers that consists of utility grid, solar panels, and energy storage systems including batteries, supercapacitors and flywheels. We frame the co-design problem as a mixed-integer linear program that can be solved with global optimality guarantees. We showcase our framework in a case-study of a distribution center in the Netherlands. Our results show that although the battery-only configuration is already competitive, adding supercapacitors or flywheel storage decrease total cost and increase energy sold back to the grid. Overall, the fully hybrid solution (Battery+Supercapacitors+Flywheel) offers the best outcomes, achieving the lowest overall cost (1.96\% lower compared to battery-only) and reduced grid dependency, but at a higher (2.6\%) initial investment.
Quantitative Error Feedback for Quantization Noise Reduction of Filtering over Graphs ICASSP 10
This paper introduces an innovative error feedback framework designed to mitigate quantization noise in distributed graph filtering, where communications are constrained to quantized messages. It comes from error spectrum shaping techniques from state-space digital filters, and therefore establishes connections between quantized filtering processes over different domains. In contrast to existing error compensation methods, our framework quantitatively feeds back the quantization noise for exact compensation. We examine the framework under three key scenarios: (i) deterministic graph filtering, (ii) graph filtering over random graphs, and (iii) graph filtering with random node-asynchronous updates. Rigorous theoretical analysis demonstrates that the proposed framework significantly reduces the effect of quantization noise, and we provide closed-form solutions for the optimal error feedback coefficients. Moreover, this quantitative error feedback mechanism can be seamlessly integrated into communication-efficient decentralized optimization frameworks, enabling lower error floors. Numerical experiments validate the theoretical results, consistently showing that our method outperforms conventional quantization strategies in terms of both accuracy and robustness.
comment: Journal Paper from ICASSP 10.1109/ICASSP49660.2025.10888821
Captivity-Escape Games as a Means for Safety in Online Motion Generation
This paper presents a method that addresses the conservatism, computational effort, and limited numerical accuracy of existing frameworks and methods that ensure safety in online model-based motion generation, commonly referred to as fast and safe tracking. Computational limitations restrict online motion planning to low-fidelity models. However, planning with low-fidelity models compromises safety, as the dynamic feasibility of resulting reference trajectories is not ensured. This potentially leads to unavoidable tracking errors that may cause safety-critical constraint violations. Existing frameworks mitigate this safety risk by augmenting safety-critical constraints in motion planning by a safety margin that prevents constraint violations under worst-case tracking errors. However, the methods employed in these frameworks determine the safety margin based on a heuristically selected performance of the planning model, which likely results in overly conservative reference trajectories. Furthermore, these methods are computationally intensive, and the state-of-the-art method is limited in numerical accuracy. We adopt a different perspective and address these limitations with a method that mitigates conservatism in existing frameworks by adapting the planning model performance to a given safety margin. Our method achieves numerical accuracy and requires significantly less computation time than existing methods by leveraging a captivity-escape game, which is a specific zero-sum differential game formulated in this paper. We demonstrate our method using a numerical example and compare it to the state of the art.
Prediction of the Conditional Probability Densities of Time Interval Extrema for Risk-Sensitive Scheduling
Planning and scheduling activities in the electrical power system, such as the commitment of reserve generation, often involve statistical characterization of peak demand. Extreme Value Analysis (EVA)-based probabilistic assessments of annual peaks are widely adopted by energy regulatory and oversight agencies to determine the likelihood and severity of potential energy shortfalls. Due to the inability of classical EVA to account for peak distributions that change with annual extreme temperatures, popular existing approaches apply EVA on simulated annual peaks created by weather-dependent surrogate models using Mont\'e-Carlo simulations on a per-scenario basis. In higher time resolutions such as day-ahead scheduling, the daily peak demand changes upon various factors besides temperature, Mont\'e-Carlo experiments become intractable, and EVA-based modeling manifests as a methodological vacuum. This article explores uncharted territories and pioneers an unparalleled nonstationary EVA estimator that predicts the probable peaks of high-resolution time intervals and their corresponding conditional probability densities based on calendar information and weather conditions where historical peaks are observed. We present a case study on the determination of day-ahead scheduling capacity and demonstrate that compared to the industry approach, our approach results in a $38\%$ reduction in the yearly total committed capacity while maintaining the given risk requirement.
Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion
Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimations of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their framework. In this work, we propose a novel two-stage training framework to jointly synthesize the controller and Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training data sampling strategy and a domain updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $\alpha,\!\beta$-CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume $5 - 1.5\cdot 10^{5}$ times larger compared to the baselines, and our verification on continuous systems can be up to $40-10000$ times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.
React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN
We study parameterizations of stabilizing nonlinear policies for learning-based control. We propose a structure based on a nonlinear version of the Youla-Ku\v{c}era parameterization combined with robust neural networks such as the recurrent equilibrium network (REN). The resulting parameterizations are unconstrained, and hence can be searched over with first-order optimization methods, while always ensuring closed-loop stability by construction. We study the combination of (a) nonlinear dynamics, (b) partial observation, and (c) incremental closed-loop stability requirements (contraction and Lipschitzness). We find that with any two of these three difficulties, a contracting and Lipschitz Youla parameter always leads to contracting and Lipschitz closed loops. However, if all three hold, then incremental stability can be lost with exogenous disturbances. Instead, a weaker condition is maintained, which we call d-tube contraction and Lipschitzness. We further obtain converse results showing that the proposed parameterization covers all contracting and Lipschitz closed loops for certain classes of nonlinear systems. Numerical experiments illustrate the utility of our parameterization when learning controllers with built-in stability certificates for: i) ``economic'' rewards without stabilizing effects; ii) short training horizons; and iii) uncertain systems.
Split the Yield, Share the Risk: Pricing, Hedging and Fixed rates in DeFi
We present the first formal treatment of \emph{yield tokenization}, a mechanism that decomposes yield-bearing assets into principal and yield components to facilitate risk transfer and price discovery in decentralized finance (DeFi). We propose a model that characterizes yield token dynamics using stochastic differential equations. We derive a no-arbitrage pricing framework for yield tokens, enabling their use in hedging future yield volatility and managing interest rate risk in decentralized lending pools. Taking DeFi lending as our focus, we show how both borrowers and lenders can use yield tokens to achieve optimal hedging outcomes and mitigate exposure to adversarial interest rate manipulation. Furthermore, we design automated market makers (AMMs) that incorporate a menu of bonding curves to aggregate liquidity from participants with heterogeneous risk preferences. This leads to an efficient and incentive-compatible mechanism for trading yield tokens and yield futures. Building on these foundations, we propose a modular \textit{fixed-rate} lending protocol that synthesizes on-chain yield token markets and lending pools, enabling robust interest rate discovery and enhancing capital efficiency. Our work provides the theoretical underpinnings for risk management and fixed-income infrastructure in DeFi, offering practical mechanisms for stable and sustainable yield markets.
Comparative Performance Analysis of Numerical Discretization Methods for Electrochemical Models of Lithium-ion Batteries
This study evaluates numerical discretization methods for the Single Particle Model (SPM) used in electrochemical modeling. The methods include the Finite Difference Method (FDM), spectral methods, Pad\'e approximation, and parabolic approximation. Evaluation criteria are accuracy, execution time, and memory usage, aiming to guide method selection for electrochemical models. Under constant current conditions, the FDM explicit Euler and Runge-Kutta methods show significant errors, while the FDM implicit Euler method improves accuracy with more nodes. The spectral method achieves the best accuracy and convergence with as few as five nodes. The Pad\'e approximation exhibits increasing errors with higher current, and the parabolic approximation shows higher errors than the converged spectral and FDM implicit Euler methods. Under dynamic conditions, frequency domain analysis indicates that the FDM, spectral, and Pad\'e approximation methods improve high-frequency response by increasing node count or method order. In terms of execution time, the parabolic method is fastest, followed by the Pad\'e approximation. The spectral method is faster than FDM, while the FDM implicit Euler method is the slowest. Memory usage is lowest for the parabolic and Pad\'e methods, moderate for FDM, and highest for the spectral method. These findings provide practical guidance for selecting discretization methods under different operating scenarios.
comment: 37 pages, 7 figures
Prioritized Planning for Continuous-time Lifelong Multi-agent Pathfinding
Multi-agent Path Finding (MAPF) is the problem of planning collision-free movements of agents so that they get from where they are to where they need to be. Commonly, agents are located on a graph and can traverse edges. This problem has many variations and has been studied for decades. Two such variations are the continuous-time and the lifelong MAPF problems. In the former, edges have non-unit lengths and volumetric agents can traverse them at any real-valued time. In the latter, agents must attend to a continuous stream of incoming tasks. Much work has been devoted to designing solution methods within these two areas. To our knowledge, however, the combined problem of continuous-time lifelong MAPF has yet to be addressed. This work addresses continuous-time lifelong MAPF with volumetric agents by presenting the fast and sub-optimal Continuous-time Prioritized Lifelong Planner (CPLP). CPLP continuously assigns agents to tasks and computes plans using a combination of two path planners; one based on CCBS and the other based on SIPP. Experimental results with up to 800 agents on graphs with up to 12 000 vertices demonstrate practical performance, where maximum planning times fall within the available time budget. Additionally, CPLP ensures collision-free movement even when failing to meet this budget. Therefore, the robustness of CPLP highlights its potential for real-world applications.
comment: 6 pages, 3 figures
Performance Analysis of Multirate Systems: A Direct Frequency-Domain Identification Approach
Frequency-domain performance analysis of intersample behavior in sampled-data and multirate systems is challenging due to the lack of a frequency-separation principle, and systematic identification techniques are lacking. The aim of this \manuscript is to develop an efficient technique for identifying the full intersample performance in the frequency-domain for closed-loop multirate systems, in particular the Performance Frequency Gain (PFG). Through local modeling techniques, aliased frequency components are effectively disentangled when identifying the PFG, which is directly facilitated by frequency-lifting the multirate system to a multivariable time-invariant representation. The developed method accurately and directly identifies the PFG in a single identification experiment. Finally, the developed method is experimentally validated on a prototype motion system, showing accurate identification of frequency-domain representations for the multirate system, including the PFG.
A Comparative Study of SMT and MILP for the Nurse Rostering Problem
The effects of personnel scheduling on the quality of care and working conditions for healthcare personnel have been thoroughly documented. However, the ever-present demand and large variation of constraints make healthcare scheduling particularly challenging. This problem has been studied for decades, with limited research aimed at applying Satisfiability Modulo Theories (SMT). SMT has gained momentum within the formal verification community in the last decades, leading to the advancement of SMT solvers that have been shown to outperform standard mathematical programming techniques. In this work, we propose generic constraint formulations that can model a wide range of real-world scheduling constraints. Then, the generic constraints are formulated as SMT and MILP problems and used to compare the respective state-of-the-art solvers, Z3 and Gurobi, on academic and real-world inspired rostering problems. Experimental results show how each solver excels for certain types of problems; the MILP solver generally performs better when the problem is highly constrained or infeasible, while the SMT solver performs better otherwise. On real-world inspired problems containing a more varied set of shifts and personnel, the SMT solver excels. Additionally, it was noted during experimentation that the SMT solver was more sensitive to the way the generic constraints were formulated, requiring careful consideration and experimentation to achieve better performance. We conclude that SMT-based methods present a promising avenue for future research within the domain of personnel scheduling.
comment: 6 pages, 3 figures
Agile Decision-Making and Safety-Critical Motion Planning for Emergency Autonomous Vehicles
Efficiency is critical for autonomous vehicles (AVs), especially for emergency AVs. However, most existing methods focus on regular vehicles, overlooking the distinct strategies required by emergency vehicles to address the challenge of maximizing efficiency while ensuring safety. In this paper, we propose an Integrated Agile Decision-Making with Active and Safety-Critical Motion Planning System (IDEAM). IDEAM focuses on enabling emergency AVs, such as ambulances, to actively attain efficiency in dense traffic scenarios with safety in mind. Firstly, the speed-centric decision-making algorithm named the long short-term spatio-temporal graph-centric decision-making (LSGM) is given. LSGM comprises conditional depth-first search (C-DFS) for multiple paths generation as well as methods for speed gains and risk evaluation for path selection, which presents a robust algorithm for high efficiency and safety consideration. Secondly, with an output path from LSGM, the motion planner reconsiders environmental conditions to decide constraints states for the final planning stage, among which the lane-probing state is designed for actively attaining spatial and speed advantage. Thirdly, under the Frenet-based model predictive control (MPC) framework with final constraints state and selected path, the safety-critical motion planner employs decoupled discrete control barrier functions (DCBFs) and linearized discrete-time high-order control barrier functions (DHOCBFs) to model the constraints associated with different driving behaviors, making the optimal optimization problem convex. Finally, we extensively validate our system using scenarios from a randomly synthetic dataset, demonstrating its capability to achieve speed benefits and assure safety simultaneously.
Meta-Learning for Adaptive Control with Automated Mirror Descent
Adaptive control achieves concurrent parameter learning and stable control under uncertainties that are linearly parameterized with known nonlinear features. Nonetheless, it is often difficult to obtain such nonlinear features. To address this difficulty, recent progress has been made in integrating meta-learning with adaptive control to learn such nonlinear features from data. However, these meta-learning-based control methods rely on classical adaptation laws using gradient descent, which is confined to the Euclidean geometry. In this paper, we propose a novel method that combines meta-learning and adaptation laws based on mirror descent, a popular generalization of gradient descent, which takes advantage of the potentially non-Euclidean geometry of the parameter space. In our approach, meta-learning not only learns the nonlinear features but also searches for a suitable mirror-descent potential function that optimizes control performance. Through numerical simulations, we demonstrate the effectiveness of the proposed method in learning efficient representations and real-time tracking control performance under uncertain dynamics.
Neural Operators for Predictor Feedback Control of Nonlinear Delay Systems
Predictor feedback designs are critical for delay-compensating controllers in nonlinear systems. However, these designs are limited in practical applications as predictors cannot be directly implemented, but require numerical approximation schemes, which become computationally prohibitive when system dynamics are expensive to compute. To address this challenge, we recast the predictor design as an operator learning problem, and learn the predictor mapping via a neural operator. We prove the existence of an arbitrarily accurate neural operator approximation of the predictor operator. Under the approximated predictor, we achieve semiglobal practical stability of the closed-loop nonlinear delay system. The estimate is semiglobal in a unique sense - one can enlarge the set of initial states as desired, though this increases the difficulty of training a neural operator, which appears practically in the stability estimate. Furthermore, our analysis holds for any black-box predictor satisfying the universal approximation error bound. We demonstrate the approach by controlling a 5-link robotic manipulator with different neural operator models, achieving significant speedups compared to classic predictor feedback schemes while maintaining closed-loop stability.
comment: 26 pages. Learning for Dynamics and Control 2025
Structure and Control of Biology-inspired Networks
There is increasing interest in developing the theoretical foundations of networked control systems that illuminate how brain networks function so as to enable sensory perception, control of movement, memory and all the operations that are needed for animals to survive. The present paper proposes a biologically inspired network model featuring dynamic connections regulated by Hebbian learning. Drawing on the machinery of graph theory and classical control we show that our novel nonlinear model exhibits such biologically plausible features as bounded evolution, stability, resilience, and a kind of structural stability -- meaning that perturbations of the model parameters leave the essential properties of the model in tact. The proposed network model involves generalized cactus graphs with multiple control input nodes, and it is shown that the properties of the network are resilient to various changes in network topology provided these changes preserve the generalized cactus structure. A particular example described in what follows is an idealized network model of the visual system of a macaque monkey. The model displays resilience to network disruptions such as might occur in a living organism due to disease or injury. A different model of the same type provides an example of a system that can perform data classification.
comment: 12 pages
Systems and Control (EESS)
SIL Allocation for Mitigation Safety Functions
SIL (Safety Integrity Level) allocation plays a pivotal role in evaluating the significance of Safety Functions (SFs) within high-risk industries. The outcomes of a SIL allocation study determine the design specifications necessary to uphold the Probability of Failure on Demand (PFD) below permissible limits, thus managing risk effectively. While extensive research has focused on SIL allocation for preventive SFs, there is a noticeable gap in attention towards mitigation SFs. To address this gap, this paper discusses the shortcomings of current methods and proposes a new approach to overcome them. The principles of the proposed method are substantiated by detailed mathematical formulation and the practical application of the method is demonstrated through a case study in a road tunnel project.
Experimental Covert Communication Using Software-Defined Radio
The fundamental information-theoretic limits of covert, or low probability of detection (LPD), communication have been extensively studied for over a decade, resulting in the square root law (SRL): only $L\sqrt{n}$ covert bits can be reliably transmitted over time-bandwidth product $n$, for constant $L>0$. Transmitting more either results in detection or decoding errors. The SRL imposes significant constraints on hardware realization of provably-secure covert communication. Thus, experimental validation of covert communication is underexplored: to date, only two experimental studies of SRL-based covert communication are available, both focusing on optical channels. Here, we report our initial results demonstrating the provably-secure covert radio-frequency (RF) communication using software-defined radios (SDRs). These validate theoretical predictions, open practical avenues for implementing covert communication systems, as well as raise future research questions.
comment: 8 pages, 5 figures
Second-order AAA algorithms for structured data-driven modeling
The data-driven modeling of dynamical systems has become an essential tool for the construction of accurate computational models from real-world data. In this process, the inherent differential structures underlying the considered physical phenomena are often neglected making the reinterpretation of the learned models in a physically meaningful sense very challenging. In this work, we present three data-driven modeling approaches for the construction of dynamical systems with second-order differential structure directly from frequency domain data. Based on the second-order structured barycentric form, we extend the well-known Adaptive Antoulas-Anderson algorithm to the case of second-order systems. Depending on the available computational resources, we propose variations of the proposed method that prioritize either higher computation speed or greater modeling accuracy, and we present a theoretical analysis for the expected accuracy and performance of the proposed methods. Three numerical examples demonstrate the effectiveness of our new structured approaches in comparison to classical unstructured data-driven modeling.
comment: 37 pages, 6 figures, 3 tables
Second-Order-Cone Formulations of Power Flow for Topology Optimization
Optimization problems that involve topology optimization in scenarios with large scale outages, such as post-disaster restoration or public safety power shutoff planning, are very challenging to solve. Using simple power flow representations such as DC power flow or network flow models results in low quality solutions which requires significantly higher-than-predicted load shed to become AC feasible. Recent work has shown that formulations based on the Second Order Cone (SOC) power flow formulation find very high quality solutions with low load shed, but the computational burden of these formulations remains a significant challenge. With the aim of reducing computational time while maintaining high solution quality, this work explores formulations which replace the conic constraints with a small number of linear cuts. The goal of this approach is not to find an exact power flow solution, but rather to identify good binary decisions, where the power flow can be resolved after the binary variables are fixed. We find that a simple reformulation of the Second Order Cone Optimal Power Shutoff problem can greatly improve the solution speed, but that a full linearization of the SOC voltage cone equation results in an overestimation of the amount of power that can be delivered to loads.
Active inference as a unified model of collision avoidance behavior in human drivers
Collision avoidance -- involving a rapid threat detection and quick execution of the appropriate evasive maneuver -- is a critical aspect of driving. However, existing models of human collision avoidance behavior are fragmented, focusing on specific scenarios or only describing certain aspects of the avoidance behavior, such as response times. This paper addresses these gaps by proposing a novel computational cognitive model of human collision avoidance behavior based on active inference. Active inference provides a unified approach to modeling human behavior: the minimization of free energy. Building on prior active inference work, our model incorporates established cognitive mechanisms such as evidence accumulation to simulate human responses in two distinct collision avoidance scenarios: front-to-rear lead vehicle braking and lateral incursion by an oncoming vehicle. We demonstrate that our model explains a wide range of previous empirical findings on human collision avoidance behavior. Specifically, the model closely reproduces both aggregate results from meta-analyses previously reported in the literature and detailed, scenario-specific effects observed in a recent driving simulator study, including response timing, maneuver selection, and execution. Our results highlight the potential of active inference as a unified framework for understanding and modeling human behavior in complex real-life driving tasks.
Bregman Centroid Guided Cross-Entropy Method
The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($\mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $\textit{Bregman centroids}$ for principled information aggregation and diversity control. $\textbf{$\mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $\textbf{$\mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $\textbf{$\mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.
Optimal Coordination of Flexible DERs in Local Energy and Flexibility Markets to Ensure Social Equity
Local electricity markets offer a promising solution for integrating renewable energy sources and other distributed energy resources (DERs) into distribution networks. These markets enable the effective utilization of flexible resources by facilitating coordination among various agents. Beyond technical and economic considerations, addressing social equity within these local communities is critical and requires dedicated attention in market-clearing frameworks. This paper proposes a social equity-based market-clearing framework for the optimal management of DERs' energy and flexibility within local communities. The proposed framework incorporates consumers' energy burden to ensure fair pricing in energy market clearance. Furthermore, to ensure equity during unbalanced operating conditions, flexible resources are managed in the local flexibility market, ensuring that all participants can trade power fairly under network disturbances. The model is formulated as a second-order cone programming (SOCP) optimization and validated on the IEEE 33-bus test distribution network.
comment: Accepted for presenting in IEEE PES General Meeting 2025
Hybrid SIS Dynamics for Demand Modeling of Frequently Updated Products
We propose a hybrid spreading process model to capture the dynamics of demand for software-based products. We introduce discontinuous jumps in the state to model sudden surges in demand that can be seen immediately after a product update is released. After each update, the modeled demand evolves according to a continuous-time susceptible-infected-susceptible (SIS) epidemic model. We identify the necessary and sufficient conditions for estimating the hybrid model's parameters for an arbitrary finite number of sequential updates. We verify the parameter estimation conditions in simulation, and evaluate how the estimation of these parameters is impacted by the presence of observation and process noise. We then validate our model by applying our estimation method to daily user engagement data for a regularly updating software product, the live-service video game `Apex Legends.'
Systematic Hazard Analysis for Frontier AI using STPA
All of the frontier AI companies have published safety frameworks where they define capability thresholds and risk mitigations that determine how they will safely develop and deploy their models. Adoption of systematic approaches to risk modelling, based on established practices used in safety-critical industries, has been recommended, however frontier AI companies currently do not describe in detail any structured approach to identifying and analysing hazards. STPA (Systems-Theoretic Process Analysis) is a systematic methodology for identifying how complex systems can become unsafe, leading to hazards. It achieves this by mapping out controllers and controlled processes then analysing their interactions and feedback loops to understand how harmful outcomes could occur (Leveson & Thomas, 2018). We evaluate STPA's ability to broaden the scope, improve traceability and strengthen the robustness of safety assurance for frontier AI systems. Applying STPA to the threat model and scenario described in 'A Sketch of an AI Control Safety Case' (Korbak et al., 2025), we derive a list of Unsafe Control Actions. From these we select a subset and explore the Loss Scenarios that lead to them if left unmitigated. We find that STPA is able to identify causal factors that may be missed by unstructured hazard analysis methodologies thereby improving robustness. We suggest STPA could increase the safety assurance of frontier AI when used to complement or check coverage of existing AI governance techniques including capability thresholds, model evaluations and emergency procedures. The application of a systematic methodology supports scalability by increasing the proportion of the analysis that could be conducted by LLMs, reducing the burden on human domain experts.
comment: 29 pages, 5 figures, 7 tables
ADEPT: Adaptive Diffusion Environment for Policy Transfer Sim-to-Real
Model-free reinforcement learning has emerged as a powerful method for developing robust robot control policies capable of navigating through complex and unstructured environments. The effectiveness of these methods hinges on two essential elements: (1) the use of massively parallel physics simulations to expedite policy training, and (2) an environment generator tasked with crafting sufficiently challenging yet attainable environments to facilitate continuous policy improvement. Existing methods of outdoor environment generation often rely on heuristics constrained by a set of parameters, limiting the diversity and realism. In this work, we introduce ADEPT, a novel \textbf{A}daptive \textbf{D}iffusion \textbf{E}nvironment for \textbf{P}olicy \textbf{T}ransfer in the zero-shot sim-to-real fashion that leverages Denoising Diffusion Probabilistic Models to dynamically expand existing training environments by adding more diverse and complex environments adaptive to the current policy. ADEPT guides the diffusion model's generation process through initial noise optimization, blending noise-corrupted environments from existing training environments weighted by the policy's performance in each corresponding environment. By manipulating the noise corruption level, ADEPT seamlessly transitions between generating similar environments for policy fine-tuning and novel ones to expand training diversity. To benchmark ADEPT in off-road navigation, we propose a fast and effective multi-layer map representation for wild environment generation. Our experiments show that the policy trained by ADEPT outperforms both procedural generated and natural environments, along with popular navigation methods.
Generalized Super-Twisting Observer for a class of interconnected nonlinear systems with uncertainties
The Generalized Super-Twisting Observer (GSTO) is extended for a strongly observable class of nonlinearly interconnected systems with bounded uncertainties/perturbations. A nonsmooth strong Lyapunov function is used to prove the finite-time convergence of the proposed observer to the true system's trajectories, in the presence of the uncertainties. A case study on the interaction between two food production systems is presented, comparing the proposed observer with the High Gain observer. The results emphasize the critical role of the GSTO's discontinuous term in achieving exact estimation.
Smooth Logic Constraints in Nonlinear Optimization and Optimal Control Problems
In some optimal control problems, complex relationships between states and inputs cannot be easily represented using continuous constraints, necessitating the use of discrete logic instead. This paper presents a method for incorporating such logic constraints directly within continuous optimization frameworks, eliminating the need for binary variables or specialized solvers. Our approach reformulates arbitrary logic constraints under minimal assumptions as max-min constraints, which are then smoothed by introducing auxiliary variables into the optimization problem. When these reformulated constraints are satisfied, they guarantee that the original logical conditions hold, ensuring correctness in the optimization process. We demonstrate the effectiveness of this method on two planar quadrotor control tasks with complex logic constraints. Compared to existing techniques for encoding logic in continuous optimization, our approach achieves faster computational performance and improved convergence to feasible solutions.
comment: 6 pages, 7 figures, submitted to the 2025 IEEE Conference on Decision and Control
Update-Aware Robust Optimal Model Predictive Control for Nonlinear Systems
Robust optimal or min-max model predictive control (MPC) approaches aim to guarantee constraint satisfaction over a known, bounded uncertainty set while minimizing a worst-case performance bound. Traditionally, these methods compute a trajectory that meets the desired properties over a fixed prediction horizon, apply a portion of the resulting input, and then re-solve the MPC problem using newly obtained measurements at the next time step. However, this approach fails to account for the fact that the control trajectory will be updated in the future, potentially leading to conservative designs. In this paper, we present a novel update-aware robust optimal MPC algorithm for decreasing horizon problems on nonlinear systems that explicitly accounts for future control trajectory updates. This additional insight allows our method to provably expand the feasible solution set and guarantee improved worst-case performance bounds compared to existing techniques. Our approach formulates the trajectory generation problem as a sequence of nested existence-constrained semi-infinite programs (SIPs), which can be efficiently solved using local reduction techniques. To demonstrate its effectiveness, we evaluate our approach on a planar quadrotor problem, where it clearly outperforms an equivalent method that does not account for future updates at the cost of increased computation time.
comment: 6 pages, 2 figures, to be published in the IEEE Control System Letters
A Vertical Approach to Designing and Managing Sustainable Heterogeneous Edge Data Centers
The increasing demand for Artificial Intelligence (AI) computing poses significant environmental challenges, with both operational and embodied carbon emissions becoming major contributors. This paper presents a carbon-aware holistic methodology for designing and managing sustainable Edge Data Centers (EDCs), based on three design principles that challenge the state-of-the-art optimization paradigms. Our approach employs vertical integration across the architecture, system, and runtime layers, balances operational and embodied carbon emissions while considering EDC performance as a co-optimization objective, rather than a constraint. At the architecture level, we propose carbon-aware and approximate accelerator designs to reduce embodied carbon. At the system level, we enhance resource utilization and adapt to real-time carbon intensity variations to minimize operational emissions. Finally, at the runtime level, we develop dynamic scheduling frameworks that adjust execution, based on energy constraints and carbon intensity.
comment: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) 2025
Interpretable reinforcement learning for heat pump control through asymmetric differentiable decision trees
In recent years, deep reinforcement learning (DRL) algorithms have gained traction in home energy management systems. However, their adoption by energy management companies remains limited due to the black-box nature of DRL, which fails to provide transparent decision-making feedback. To address this, explainable reinforcement learning (XRL) techniques have emerged, aiming to make DRL decisions more transparent. Among these, soft differential decision tree (DDT) distillation provides a promising approach due to the clear decision rules they are based on, which can be efficiently computed. However, achieving high performance often requires deep, and completely full, trees, which reduces interpretability. To overcome this, we propose a novel asymmetric soft DDT construction method. Unlike traditional soft DDTs, our approach adaptively constructs trees by expanding nodes only when necessary. This improves the efficient use of decision nodes, which require a predetermined depth to construct full symmetric trees, enhancing both interpretability and performance. We demonstrate the potential of asymmetric DDTs to provide transparent, efficient, and high-performing decision-making in home energy management systems.
comment: 7 pages, 3 figures, conference
Equivalence of Left- and Right-Invariant Extended Kalman Filters on Matrix Lie Groups
This paper derives the extended Kalman filter (EKF) for continuous-time systems on matrix Lie groups observed through discrete-time measurements. By modeling the system noise on the Lie algebra and adopting a Stratonovich interpretation for the stochastic differential equation (SDE), we ensure that solutions remain on the manifold. The derivation of the filter follows classical EKF principles, naturally integrating a necessary full-order covariance reset post-measurement update. A key contribution is proving that this full-order covariance reset guarantees that the Lie-group-valued state estimate is invariant to whether a left- or right-invariant error definition is used in the EKF. Monte Carlo simulations of the aided inertial navigation problem validate the invariance property and confirm its absence when employing reduced-order covariance resets.
comment: This work has been submitted to the IEEE for possible publication
A State of the Art on Recent Progress and Emerging Challenges on Energy Transfer Between Vibrating Modes Under an External Mechanical Force With Time-Varying Frequency From 2020 to 2025
In this paper, we discuss an example of current importance with a future perspective in engineering, in which excitation sources always have limited power, limited inertia, and their frequencies vary according to the instantaneous state of the vibrating system. Practical examples of non-ideal systems are considered. The most common phenomenon for this kind of system is discussed. The period considered is from 2020 to 2025. The specific properties of various models are also discussed. Directions for future investigations are provided. In this paper, the authors revisited some publications based on the assumption that the external excitations are produced by non-ideal sources (RNIS), that is, with limited power supply. Among these applications, nonlinear phenomena such as the Sommerfeld effect and saturation phenomenon were observed, considering fractional damping. Energy harvesters and the Jacobi-Anger expansion were used in the governing equations of motion. We also used the Jacobi-Anger expansion in the case of energy transfer between vibrating modes under an external force with time-varying frequency, which represents one of the future directions of research on non-ideal vibrating systems (RNIS).
comment: 6 figures, 35 pages
Optimal Co-Design of a Hybrid Energy Storage System for Truck Charging
The major challenges to battery electric truck adoption are their high cost and grid congestion.In this context, stationary energy storage systems can help mitigate both issues. Since their design and operation are strongly coupled, to make the best out of them, they should be jointly optimized. This paper presents a co-design framework for hybrid energy storage systems where their technology and sizing are optimized jointly with their operational strategies. Specifically, we consider a microgrid supporting truck chargers that consists of utility grid, solar panels, and energy storage systems including batteries, supercapacitors and flywheels. We frame the co-design problem as a mixed-integer linear program that can be solved with global optimality guarantees. We showcase our framework in a case-study of a distribution center in the Netherlands. Our results show that although the battery-only configuration is already competitive, adding supercapacitors or flywheel storage decrease total cost and increase energy sold back to the grid. Overall, the fully hybrid solution (Battery+Supercapacitors+Flywheel) offers the best outcomes, achieving the lowest overall cost (1.96\% lower compared to battery-only) and reduced grid dependency, but at a higher (2.6\%) initial investment.
Quantitative Error Feedback for Quantization Noise Reduction of Filtering over Graphs ICASSP 10
This paper introduces an innovative error feedback framework designed to mitigate quantization noise in distributed graph filtering, where communications are constrained to quantized messages. It comes from error spectrum shaping techniques from state-space digital filters, and therefore establishes connections between quantized filtering processes over different domains. In contrast to existing error compensation methods, our framework quantitatively feeds back the quantization noise for exact compensation. We examine the framework under three key scenarios: (i) deterministic graph filtering, (ii) graph filtering over random graphs, and (iii) graph filtering with random node-asynchronous updates. Rigorous theoretical analysis demonstrates that the proposed framework significantly reduces the effect of quantization noise, and we provide closed-form solutions for the optimal error feedback coefficients. Moreover, this quantitative error feedback mechanism can be seamlessly integrated into communication-efficient decentralized optimization frameworks, enabling lower error floors. Numerical experiments validate the theoretical results, consistently showing that our method outperforms conventional quantization strategies in terms of both accuracy and robustness.
comment: Journal Paper from ICASSP 10.1109/ICASSP49660.2025.10888821
Captivity-Escape Games as a Means for Safety in Online Motion Generation
This paper presents a method that addresses the conservatism, computational effort, and limited numerical accuracy of existing frameworks and methods that ensure safety in online model-based motion generation, commonly referred to as fast and safe tracking. Computational limitations restrict online motion planning to low-fidelity models. However, planning with low-fidelity models compromises safety, as the dynamic feasibility of resulting reference trajectories is not ensured. This potentially leads to unavoidable tracking errors that may cause safety-critical constraint violations. Existing frameworks mitigate this safety risk by augmenting safety-critical constraints in motion planning by a safety margin that prevents constraint violations under worst-case tracking errors. However, the methods employed in these frameworks determine the safety margin based on a heuristically selected performance of the planning model, which likely results in overly conservative reference trajectories. Furthermore, these methods are computationally intensive, and the state-of-the-art method is limited in numerical accuracy. We adopt a different perspective and address these limitations with a method that mitigates conservatism in existing frameworks by adapting the planning model performance to a given safety margin. Our method achieves numerical accuracy and requires significantly less computation time than existing methods by leveraging a captivity-escape game, which is a specific zero-sum differential game formulated in this paper. We demonstrate our method using a numerical example and compare it to the state of the art.
Prediction of the Conditional Probability Densities of Time Interval Extrema for Risk-Sensitive Scheduling
Planning and scheduling activities in the electrical power system, such as the commitment of reserve generation, often involve statistical characterization of peak demand. Extreme Value Analysis (EVA)-based probabilistic assessments of annual peaks are widely adopted by energy regulatory and oversight agencies to determine the likelihood and severity of potential energy shortfalls. Due to the inability of classical EVA to account for peak distributions that change with annual extreme temperatures, popular existing approaches apply EVA on simulated annual peaks created by weather-dependent surrogate models using Mont\'e-Carlo simulations on a per-scenario basis. In higher time resolutions such as day-ahead scheduling, the daily peak demand changes upon various factors besides temperature, Mont\'e-Carlo experiments become intractable, and EVA-based modeling manifests as a methodological vacuum. This article explores uncharted territories and pioneers an unparalleled nonstationary EVA estimator that predicts the probable peaks of high-resolution time intervals and their corresponding conditional probability densities based on calendar information and weather conditions where historical peaks are observed. We present a case study on the determination of day-ahead scheduling capacity and demonstrate that compared to the industry approach, our approach results in a $38\%$ reduction in the yearly total committed capacity while maintaining the given risk requirement.
Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion
Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimations of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their framework. In this work, we propose a novel two-stage training framework to jointly synthesize the controller and Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training data sampling strategy and a domain updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $\alpha,\!\beta$-CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume $5 - 1.5\cdot 10^{5}$ times larger compared to the baselines, and our verification on continuous systems can be up to $40-10000$ times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.
React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN
We study parameterizations of stabilizing nonlinear policies for learning-based control. We propose a structure based on a nonlinear version of the Youla-Ku\v{c}era parameterization combined with robust neural networks such as the recurrent equilibrium network (REN). The resulting parameterizations are unconstrained, and hence can be searched over with first-order optimization methods, while always ensuring closed-loop stability by construction. We study the combination of (a) nonlinear dynamics, (b) partial observation, and (c) incremental closed-loop stability requirements (contraction and Lipschitzness). We find that with any two of these three difficulties, a contracting and Lipschitz Youla parameter always leads to contracting and Lipschitz closed loops. However, if all three hold, then incremental stability can be lost with exogenous disturbances. Instead, a weaker condition is maintained, which we call d-tube contraction and Lipschitzness. We further obtain converse results showing that the proposed parameterization covers all contracting and Lipschitz closed loops for certain classes of nonlinear systems. Numerical experiments illustrate the utility of our parameterization when learning controllers with built-in stability certificates for: i) ``economic'' rewards without stabilizing effects; ii) short training horizons; and iii) uncertain systems.
Split the Yield, Share the Risk: Pricing, Hedging and Fixed rates in DeFi
We present the first formal treatment of \emph{yield tokenization}, a mechanism that decomposes yield-bearing assets into principal and yield components to facilitate risk transfer and price discovery in decentralized finance (DeFi). We propose a model that characterizes yield token dynamics using stochastic differential equations. We derive a no-arbitrage pricing framework for yield tokens, enabling their use in hedging future yield volatility and managing interest rate risk in decentralized lending pools. Taking DeFi lending as our focus, we show how both borrowers and lenders can use yield tokens to achieve optimal hedging outcomes and mitigate exposure to adversarial interest rate manipulation. Furthermore, we design automated market makers (AMMs) that incorporate a menu of bonding curves to aggregate liquidity from participants with heterogeneous risk preferences. This leads to an efficient and incentive-compatible mechanism for trading yield tokens and yield futures. Building on these foundations, we propose a modular \textit{fixed-rate} lending protocol that synthesizes on-chain yield token markets and lending pools, enabling robust interest rate discovery and enhancing capital efficiency. Our work provides the theoretical underpinnings for risk management and fixed-income infrastructure in DeFi, offering practical mechanisms for stable and sustainable yield markets.
Comparative Performance Analysis of Numerical Discretization Methods for Electrochemical Models of Lithium-ion Batteries
This study evaluates numerical discretization methods for the Single Particle Model (SPM) used in electrochemical modeling. The methods include the Finite Difference Method (FDM), spectral methods, Pad\'e approximation, and parabolic approximation. Evaluation criteria are accuracy, execution time, and memory usage, aiming to guide method selection for electrochemical models. Under constant current conditions, the FDM explicit Euler and Runge-Kutta methods show significant errors, while the FDM implicit Euler method improves accuracy with more nodes. The spectral method achieves the best accuracy and convergence with as few as five nodes. The Pad\'e approximation exhibits increasing errors with higher current, and the parabolic approximation shows higher errors than the converged spectral and FDM implicit Euler methods. Under dynamic conditions, frequency domain analysis indicates that the FDM, spectral, and Pad\'e approximation methods improve high-frequency response by increasing node count or method order. In terms of execution time, the parabolic method is fastest, followed by the Pad\'e approximation. The spectral method is faster than FDM, while the FDM implicit Euler method is the slowest. Memory usage is lowest for the parabolic and Pad\'e methods, moderate for FDM, and highest for the spectral method. These findings provide practical guidance for selecting discretization methods under different operating scenarios.
comment: 37 pages, 7 figures
Prioritized Planning for Continuous-time Lifelong Multi-agent Pathfinding
Multi-agent Path Finding (MAPF) is the problem of planning collision-free movements of agents so that they get from where they are to where they need to be. Commonly, agents are located on a graph and can traverse edges. This problem has many variations and has been studied for decades. Two such variations are the continuous-time and the lifelong MAPF problems. In the former, edges have non-unit lengths and volumetric agents can traverse them at any real-valued time. In the latter, agents must attend to a continuous stream of incoming tasks. Much work has been devoted to designing solution methods within these two areas. To our knowledge, however, the combined problem of continuous-time lifelong MAPF has yet to be addressed. This work addresses continuous-time lifelong MAPF with volumetric agents by presenting the fast and sub-optimal Continuous-time Prioritized Lifelong Planner (CPLP). CPLP continuously assigns agents to tasks and computes plans using a combination of two path planners; one based on CCBS and the other based on SIPP. Experimental results with up to 800 agents on graphs with up to 12 000 vertices demonstrate practical performance, where maximum planning times fall within the available time budget. Additionally, CPLP ensures collision-free movement even when failing to meet this budget. Therefore, the robustness of CPLP highlights its potential for real-world applications.
comment: 6 pages, 3 figures
Performance Analysis of Multirate Systems: A Direct Frequency-Domain Identification Approach
Frequency-domain performance analysis of intersample behavior in sampled-data and multirate systems is challenging due to the lack of a frequency-separation principle, and systematic identification techniques are lacking. The aim of this \manuscript is to develop an efficient technique for identifying the full intersample performance in the frequency-domain for closed-loop multirate systems, in particular the Performance Frequency Gain (PFG). Through local modeling techniques, aliased frequency components are effectively disentangled when identifying the PFG, which is directly facilitated by frequency-lifting the multirate system to a multivariable time-invariant representation. The developed method accurately and directly identifies the PFG in a single identification experiment. Finally, the developed method is experimentally validated on a prototype motion system, showing accurate identification of frequency-domain representations for the multirate system, including the PFG.
A Comparative Study of SMT and MILP for the Nurse Rostering Problem
The effects of personnel scheduling on the quality of care and working conditions for healthcare personnel have been thoroughly documented. However, the ever-present demand and large variation of constraints make healthcare scheduling particularly challenging. This problem has been studied for decades, with limited research aimed at applying Satisfiability Modulo Theories (SMT). SMT has gained momentum within the formal verification community in the last decades, leading to the advancement of SMT solvers that have been shown to outperform standard mathematical programming techniques. In this work, we propose generic constraint formulations that can model a wide range of real-world scheduling constraints. Then, the generic constraints are formulated as SMT and MILP problems and used to compare the respective state-of-the-art solvers, Z3 and Gurobi, on academic and real-world inspired rostering problems. Experimental results show how each solver excels for certain types of problems; the MILP solver generally performs better when the problem is highly constrained or infeasible, while the SMT solver performs better otherwise. On real-world inspired problems containing a more varied set of shifts and personnel, the SMT solver excels. Additionally, it was noted during experimentation that the SMT solver was more sensitive to the way the generic constraints were formulated, requiring careful consideration and experimentation to achieve better performance. We conclude that SMT-based methods present a promising avenue for future research within the domain of personnel scheduling.
comment: 6 pages, 3 figures
Agile Decision-Making and Safety-Critical Motion Planning for Emergency Autonomous Vehicles
Efficiency is critical for autonomous vehicles (AVs), especially for emergency AVs. However, most existing methods focus on regular vehicles, overlooking the distinct strategies required by emergency vehicles to address the challenge of maximizing efficiency while ensuring safety. In this paper, we propose an Integrated Agile Decision-Making with Active and Safety-Critical Motion Planning System (IDEAM). IDEAM focuses on enabling emergency AVs, such as ambulances, to actively attain efficiency in dense traffic scenarios with safety in mind. Firstly, the speed-centric decision-making algorithm named the long short-term spatio-temporal graph-centric decision-making (LSGM) is given. LSGM comprises conditional depth-first search (C-DFS) for multiple paths generation as well as methods for speed gains and risk evaluation for path selection, which presents a robust algorithm for high efficiency and safety consideration. Secondly, with an output path from LSGM, the motion planner reconsiders environmental conditions to decide constraints states for the final planning stage, among which the lane-probing state is designed for actively attaining spatial and speed advantage. Thirdly, under the Frenet-based model predictive control (MPC) framework with final constraints state and selected path, the safety-critical motion planner employs decoupled discrete control barrier functions (DCBFs) and linearized discrete-time high-order control barrier functions (DHOCBFs) to model the constraints associated with different driving behaviors, making the optimal optimization problem convex. Finally, we extensively validate our system using scenarios from a randomly synthetic dataset, demonstrating its capability to achieve speed benefits and assure safety simultaneously.
Meta-Learning for Adaptive Control with Automated Mirror Descent
Adaptive control achieves concurrent parameter learning and stable control under uncertainties that are linearly parameterized with known nonlinear features. Nonetheless, it is often difficult to obtain such nonlinear features. To address this difficulty, recent progress has been made in integrating meta-learning with adaptive control to learn such nonlinear features from data. However, these meta-learning-based control methods rely on classical adaptation laws using gradient descent, which is confined to the Euclidean geometry. In this paper, we propose a novel method that combines meta-learning and adaptation laws based on mirror descent, a popular generalization of gradient descent, which takes advantage of the potentially non-Euclidean geometry of the parameter space. In our approach, meta-learning not only learns the nonlinear features but also searches for a suitable mirror-descent potential function that optimizes control performance. Through numerical simulations, we demonstrate the effectiveness of the proposed method in learning efficient representations and real-time tracking control performance under uncertain dynamics.
Neural Operators for Predictor Feedback Control of Nonlinear Delay Systems
Predictor feedback designs are critical for delay-compensating controllers in nonlinear systems. However, these designs are limited in practical applications as predictors cannot be directly implemented, but require numerical approximation schemes, which become computationally prohibitive when system dynamics are expensive to compute. To address this challenge, we recast the predictor design as an operator learning problem, and learn the predictor mapping via a neural operator. We prove the existence of an arbitrarily accurate neural operator approximation of the predictor operator. Under the approximated predictor, we achieve semiglobal practical stability of the closed-loop nonlinear delay system. The estimate is semiglobal in a unique sense - one can enlarge the set of initial states as desired, though this increases the difficulty of training a neural operator, which appears practically in the stability estimate. Furthermore, our analysis holds for any black-box predictor satisfying the universal approximation error bound. We demonstrate the approach by controlling a 5-link robotic manipulator with different neural operator models, achieving significant speedups compared to classic predictor feedback schemes while maintaining closed-loop stability.
comment: 26 pages. Learning for Dynamics and Control 2025
Structure and Control of Biology-inspired Networks
There is increasing interest in developing the theoretical foundations of networked control systems that illuminate how brain networks function so as to enable sensory perception, control of movement, memory and all the operations that are needed for animals to survive. The present paper proposes a biologically inspired network model featuring dynamic connections regulated by Hebbian learning. Drawing on the machinery of graph theory and classical control we show that our novel nonlinear model exhibits such biologically plausible features as bounded evolution, stability, resilience, and a kind of structural stability -- meaning that perturbations of the model parameters leave the essential properties of the model in tact. The proposed network model involves generalized cactus graphs with multiple control input nodes, and it is shown that the properties of the network are resilient to various changes in network topology provided these changes preserve the generalized cactus structure. A particular example described in what follows is an idealized network model of the visual system of a macaque monkey. The model displays resilience to network disruptions such as might occur in a living organism due to disease or injury. A different model of the same type provides an example of a system that can perform data classification.
comment: 12 pages
Robotics
Test Automation for Interactive Scenarios via Promptable Traffic Simulation CVPR 2025
Autonomous vehicle (AV) planners must undergo rigorous evaluation before widespread deployment on public roads, particularly to assess their robustness against the uncertainty of human behaviors. While recent advancements in data-driven scenario generation enable the simulation of realistic human behaviors in interactive settings, leveraging these models to construct comprehensive tests for AV planners remains an open challenge. In this work, we introduce an automated method to efficiently generate realistic and safety-critical human behaviors for AV planner evaluation in interactive scenarios. We parameterize complex human behaviors using low-dimensional goal positions, which are then fed into a promptable traffic simulator, ProSim, to guide the behaviors of simulated agents. To automate test generation, we introduce a prompt generation module that explores the goal domain and efficiently identifies safety-critical behaviors using Bayesian optimization. We apply our method to the evaluation of an optimization-based planner and demonstrate its effectiveness and efficiency in automatically generating diverse and realistic driving behaviors across scenarios with varying initial conditions.
comment: Accepted by CVPR 2025 Workshop Data-Driven Autonomous Driving Simulation (track 1)
OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation
We introduce OG-VLA, a novel architecture and learning framework that combines the generalization strengths of Vision Language Action models (VLAs) with the robustness of 3D-aware policies. We address the challenge of mapping natural language instructions and multi-view RGBD observations to quasi-static robot actions. 3D-aware robot policies achieve state-of-the-art performance on precise robot manipulation tasks, but struggle with generalization to unseen instructions, scenes, and objects. On the other hand, VLAs excel at generalizing across instructions and scenes, but can be sensitive to camera and robot pose variations. We leverage prior knowledge embedded in language and vision foundation models to improve generalization of 3D-aware keyframe policies. OG-VLA projects input observations from diverse views into a point cloud which is then rendered from canonical orthographic views, ensuring input view invariance and consistency between input and output spaces. These canonical views are processed with a vision backbone, a Large Language Model (LLM), and an image diffusion model to generate images that encode the next position and orientation of the end-effector on the input scene. Evaluations on the Arnold and Colosseum benchmarks demonstrate state-of-the-art generalization to unseen environments, with over 40% relative improvements while maintaining robust performance in seen settings. We also show real-world adaption in 3 to 5 demonstrations along with strong generalization. Videos and resources at https://og-vla.github.io/
comment: 17 pages
HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control
We introduce HoMeR, an imitation learning framework for mobile manipulation that combines whole-body control with hybrid action modes that handle both long-range and fine-grained motion, enabling effective performance on realistic in-the-wild tasks. At its core is a fast, kinematics-based whole-body controller that maps desired end-effector poses to coordinated motion across the mobile base and arm. Within this reduced end-effector action space, HoMeR learns to switch between absolute pose predictions for long-range movement and relative pose predictions for fine-grained manipulation, offloading low-level coordination to the controller and focusing learning on task-level decisions. We deploy HoMeR on a holonomic mobile manipulator with a 7-DoF arm in a real home. We compare HoMeR to baselines without hybrid actions or whole-body control across 3 simulated and 3 real household tasks such as opening cabinets, sweeping trash, and rearranging pillows. Across tasks, HoMeR achieves an overall success rate of 79.17% using just 20 demonstrations per task, outperforming the next best baseline by 29.17 on average. HoMeR is also compatible with vision-language models and can leverage their internet-scale priors to better generalize to novel object appearances, layouts, and cluttered scenes. In summary, HoMeR moves beyond tabletop settings and demonstrates a scalable path toward sample-efficient, generalizable manipulation in everyday indoor spaces. Code, videos, and supplementary material are available at: http://homer-manip.github.io
Humanoid World Models: Open World Foundation Models for Humanoid Robotics
Humanoid robots have the potential to perform complex tasks in human centered environments but require robust predictive models to reason about the outcomes of their actions. We introduce Humanoid World Models (HWM) a family of lightweight open source video based models that forecast future egocentric observations conditioned on actions. We train two types of generative models Masked Transformers and FlowMatching on 100 hours of humanoid demonstrations. Additionally we explore architectural variants with different attention mechanisms and parameter sharing strategies. Our parameter sharing techniques reduce model size by 33 to 53 with minimal impact on performance or visual fidelity. HWM is designed to be trained and deployed in practical academic and small lab settings such as 1 to 2 GPUs.
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
To ensure learned controllers comply with safety and reliability requirements for reinforcement learning in real-world settings remains challenging. Traditional safety assurance approaches, such as state avoidance and constrained Markov decision processes, often inadequately capture trajectory requirements or may result in overly conservative behaviors. To address these limitations, recent studies advocate the use of formal specification languages such as linear temporal logic (LTL), enabling the derivation of correct-by-construction learning objectives from the specified requirements. However, the sparse rewards associated with LTL specifications make learning extremely difficult, whereas dense heuristic-based rewards risk compromising correctness. In this work, we propose the first method, to our knowledge, that integrates LTL with differentiable simulators, facilitating efficient gradient-based learning directly from LTL specifications by coupling with differentiable paradigms. Our approach introduces soft labeling to achieve differentiable rewards and states, effectively mitigating the sparse-reward issue intrinsic to LTL without compromising objective correctness. We validate the efficacy of our method through experiments, demonstrating significant improvements in both reward attainment and training time compared to the discrete methods.
Standing Tall: Robust Fall Prediction for Bipedal Robots
This paper extends the fall prediction algorithm from Mungai et al.(2024) to a real-time/online setting, implemented in both hardware and simulation. This yields results comparable to the offline version, maintaining a zero false positive rate, sufficient lead time, and accurate lead time prediction. Additionally, it achieves a high recovery rate. The paper also evaluates the fall prediction algorithm against omnidirectional faults and introduces an improved algorithm capable of reliably predicting falls and lead times across a wider range of faults in full-sized robots. Compared to Mungai et al.(2024), the proposed algorithm performs significantly better across all metrics, such as false positive rate, lead time, accuracy, and response time, demonstrating the algorithm's efficacy for real-time fall prediction in bipedal robots.
comment: Submitted to Humanoids 2025. This work has been submitted to the IEEE for possible publication
$\text{TREX}^2$: Dual-Reconstruction Framework for Teleoperated-Robot with EXtended Reality
Robot teleoperation with extended reality (XR teleoperation) enables intuitive interaction by allowing remote robots to mimic user motions with real-time 3D feedback. However, existing systems face significant motion-to-motion (M2M) latency -- the delay between the user's latest motion and the corresponding robot feedback -- leading to high teleoperation error and mission completion time. This issue stems from the system's exclusive reliance on network communication, making it highly vulnerable to network degradation. To address these challenges, we introduce $\text{TREX}^2$, the first end-to-end, fully open-sourced XR teleoperation framework that decouples robot control and XR visualization from network dependencies. $\text{TREX}^2$ leverages local sensing data to reconstruct delayed or missing information of the counterpart, thereby significantly reducing network-induced issues. This approach allows both the XR and robot to run concurrently with network transmission while maintaining high robot planning accuracy. $\text{TREX}^2$ also features contention-aware scheduling to mitigate GPU contention and bandwidth-adaptive point cloud scaling to cope with limited bandwidth. We implement $\text{TREX}^2$ across three hardware settings, including simulated and physical robots, and evaluate it on 9,500 real-world teleoperation trials from the RoboSet dataset \cite{kumar2024robohive}, covering single- and multi-step missions. Compared to state-of-the-art XR teleoperation frameworks, $\text{TREX}^2$ reduces teleoperation error by up to 69.8% on WLAN and 73.1% on cellular networks with only 6.7% maximum runtime overhead. It also improves completion time by up to 47.7%, enabling smoother teleoperation. A real-world case study on ten stationary and mobile missions further shows $\text{TREX}^2$ achieves up to 37.7% faster completion while lowering average teleoperation error by up to 57.2%.
iRonCub 3: The Jet-Powered Flying Humanoid Robot
This article presents iRonCub 3, a jet-powered humanoid robot, and its first flight experiments. Unlike traditional aerial vehicles, iRonCub 3 aims to achieve flight using a full-body humanoid form, which poses unique challenges in control, estimation, and system integration. We highlight the robot's current mechanical and software architecture, including its propulsion system, control framework, and experimental infrastructure. The control and estimation framework is first validated in simulation by performing a takeoff and tracking a reference trajectory. Then, we demonstrate, for the first time, a liftoff of a jet-powered humanoid robot - an initial but significant step toward aerial humanoid mobility. Also, we detail how the experimental area around a jet-powered humanoid robot should be designed in order to deal with a level of complexity that is substantially superior than indoor humanoid robot experiments.
RoboTwin: A Robotic Teleoperation Framework Using Digital Twins
Robotic surgery imposes a significant cognitive burden on the surgeon. This cognitive burden increases in the case of remote robotic surgeries due to latency between entities and thus might affect the quality of surgery. Here, the patient side and the surgeon side are geographically separated by hundreds to thousands of kilometres. Real-time teleoperation of robots requires strict latency bounds for control and feedback. We propose a dual digital twin (DT) framework and explain the simulation environment and teleoperation framework. Here, the doctor visually controls the locally available DT of the patient side and thus experiences minimum latency. The second digital twin serves two purposes. Firstly, it provides a layer of safety for operator-related mishaps, and secondly, it conveys the coordinates of known and unknown objects back to the operator's side digital twin. We show that teleoperation accuracy and user experience are enhanced with our approach. Experimental results using the NASA-TLX metric show that the quality of surgery is vastly improved with DT, perhaps due to reduced cognitive burden. The network data rate for identifying objects at the operator side is 25x lower than normal.
Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody
Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.
comment: Accepted to Interspeech 2025
Robust and Safe Multi-Agent Reinforcement Learning Framework with Communication for Autonomous Vehicles
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for many multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance safety of the system. However, zero-shot transfer of simulator-trained MARL policies to hardware dynamic systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper introduces RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experiment results on F1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 19 pages, 9 Figures
Globally Consistent RGB-D SLAM with 2D Gaussian Splatting
Recently, 3D Gaussian splatting-based RGB-D SLAM displays remarkable performance of high-fidelity 3D reconstruction. However, the lack of depth rendering consistency and efficient loop closure limits the quality of its geometric reconstructions and its ability to perform globally consistent mapping online. In this paper, we present 2DGS-SLAM, an RGB-D SLAM system using 2D Gaussian splatting as the map representation. By leveraging the depth-consistent rendering property of the 2D variant, we propose an accurate camera pose optimization method and achieve geometrically accurate 3D reconstruction. In addition, we implement efficient loop detection and camera relocalization by leveraging MASt3R, a 3D foundation model, and achieve efficient map updates by maintaining a local active map. Experiments show that our 2DGS-SLAM approach achieves superior tracking accuracy, higher surface reconstruction quality, and more consistent global map reconstruction compared to existing rendering-based SLAM methods, while maintaining high-fidelity image rendering and improved computational efficiency.
comment: 18 pages
Towards Predicting Any Human Trajectory In Context
Predicting accurate future trajectories of pedestrians is essential for autonomous systems but remains a challenging task due to the need for adaptability in different environments and domains. A common approach involves collecting scenario-specific data and performing fine-tuning via backpropagation. However, this process is often impractical on edge devices due to constrained computational resources. To address this challenge, we introduce TrajICL, an In-Context Learning (ICL) framework for pedestrian trajectory prediction that enables rapid adaptation without fine-tuning on the scenario-specific data. We propose a spatio-temporal similarity-based example selection (STES) method that selects relevant examples from previously observed trajectories within the same scene by identifying similar motion patterns at corresponding locations. To further refine this selection, we introduce prediction-guided example selection (PG-ES), which selects examples based on both the past trajectory and the predicted future trajectory, rather than relying solely on the past trajectory. This approach allows the model to account for long-term dynamics when selecting examples. Finally, instead of relying on small real-world datasets with limited scenario diversity, we train our model on a large-scale synthetic dataset to enhance its prediction ability by leveraging in-context examples. Extensive experiments demonstrate that TrajICL achieves remarkable adaptation across both in-domain and cross-domain scenarios, outperforming even fine-tuned approaches across multiple public benchmarks. The code will be released at https://fujiry0.github.io/TrajICL-project-page.
Max Entropy Moment Kalman Filter for Polynomial Systems with Arbitrary Noise
Designing optimal Bayes filters for nonlinear non-Gaussian systems is a challenging task. The main difficulties are: 1) representing complex beliefs, 2) handling non-Gaussian noise, and 3) marginalizing past states. To address these challenges, we focus on polynomial systems and propose the Max Entropy Moment Kalman Filter (MEM-KF). To address 1), we represent arbitrary beliefs by a Moment-Constrained Max-Entropy Distribution (MED). The MED can asymptotically approximate almost any distribution given an increasing number of moment constraints. To address 2), we model the noise in the process and observation model as MED. To address 3), we propagate the moments through the process model and recover the distribution as MED, thus avoiding symbolic integration, which is generally intractable. All the steps in MEM-KF, including the extraction of a point estimate, can be solved via convex optimization. We showcase the MEM-KF in challenging robotics tasks, such as localization with unknown data association.
Improving Multi-Vehicle Perception Fusion with Millimeter-Wave Radar Assistance
Cooperative perception enables vehicles to share sensor readings and has become a new paradigm to improve driving safety, where the key enabling technology for realizing this vision is to real-time and accurately align and fuse the perceptions. Recent advances to align the views rely on high-density LiDAR data or fine-grained image feature representations, which however fail to meet the requirements of accuracy, real-time, and adaptability for autonomous driving. To this end, we present MMatch, a lightweight system that enables accurate and real-time perception fusion with mmWave radar point clouds. The key insight is that fine-grained spatial information provided by the radar present unique associations with all the vehicles even in two separate views. As a result, by capturing and understanding the unique local and global position of the targets in this association, we can quickly find out all the co-visible vehicles for view alignment. We implement MMatch on both the datasets collected from the CARLA platform and the real-world traffic with over 15,000 radar point cloud pairs. Experimental results show that MMatch achieves decimeter-level accuracy within 59ms, which significantly improves the reliability for autonomous driving.
comment: to appear in IEEE INFOCOM 2025
DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving
End-to-end autonomous driving systems map sensor data directly to control commands, but remain opaque, lack interpretability, and offer no formal safety guarantees. While recent vision-language-guided reinforcement learning (RL) methods introduce semantic feedback, they often rely on static prompts and fixed objectives, limiting adaptability to dynamic driving scenes. We present DriveMind, a unified semantic reward framework that integrates: (i) a contrastive Vision-Language Model (VLM) encoder for stepwise semantic anchoring; (ii) a novelty-triggered VLM encoder-decoder, fine-tuned via chain-of-thought (CoT) distillation, for dynamic prompt generation upon semantic drift; (iii) a hierarchical safety module enforcing kinematic constraints (e.g., speed, lane centering, stability); and (iv) a compact predictive world model to reward alignment with anticipated ideal states. DriveMind achieves 19.4 +/- 2.3 km/h average speed, 0.98 +/- 0.03 route completion, and near-zero collisions in CARLA Town 2, outperforming baselines by over 4% in success rate. Its semantic reward generalizes zero-shot to real dash-cam data with minimal distributional shift, demonstrating robust cross-domain alignment and potential for real-world deployment.
Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms
Small Uncrewed Aerial Systems (sUAS) are increasingly deployed as autonomous swarms in search-and-rescue and other disaster-response scenarios. In these settings, they use computer vision (CV) to detect objects of interest and autonomously adapt their missions. However, traditional CV systems often struggle to recognize unfamiliar objects in open-world environments or to infer their relevance for mission planning. To address this, we incorporate large language models (LLMs) to reason about detected objects and their implications. While LLMs can offer valuable insights, they are also prone to hallucinations and may produce incorrect, misleading, or unsafe recommendations. To ensure safe and sensible decision-making under uncertainty, high-level decisions must be governed by cognitive guardrails. This article presents the design, simulation, and real-world integration of these guardrails for sUAS swarms in search-and-rescue missions.
comment: 16 pages, 8 figures
Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration
Object-Goal Navigation (OGN) remains challenging in real-world, multi-floor environments and under open-vocabulary object descriptions. We observe that most episodes in widely used benchmarks such as HM3D and MP3D involve multi-floor buildings, with many requiring explicit floor transitions. However, existing methods are often limited to single-floor settings or predefined object categories. To address these limitations, we tackle two key challenges: (1) efficient cross-level planning and (2) zero-shot object-goal navigation (ZS-OGN), where agents must interpret novel object descriptions without prior exposure. We propose ASCENT, a framework that combines a Multi-Floor Spatial Abstraction module for hierarchical semantic mapping and a Coarse-to-Fine Frontier Reasoning module leveraging Large Language Models (LLMs) for context-aware exploration, without requiring additional training on new object semantics or locomotion data. Our method outperforms state-of-the-art ZS-OGN approaches on HM3D and MP3D benchmarks while enabling efficient multi-floor navigation. We further validate its practicality through real-world deployment on a quadruped robot, achieving successful object exploration across unseen floors.
comment: Preprint; 27 pages, 12 figures, 10 tables; Project Page at https://zeying-gong.github.io/projects/ascent
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control
Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications -- parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.
comment: Project webpage: https://younggyo.me/fast_td3
Multimodal Sensing and Machine Learning to Compare Printed and Verbal Assembly Instructions Delivered by a Social Robot
In this paper, we compare a manual assembly task communicated to workers using both printed and robot-delivered instructions. The comparison was made using physiological signals (blood volume pulse (BVP) and electrodermal activity (EDA)) collected from individuals during an experimental study. In addition, we also collected responses of individuals using the NASA Task Load Index (TLX) survey. Furthermore, we mapped the collected physiological signals to the responses of participants for NASA TLX to predict their workload. For both the classification problems, we compare the performance of Convolutional Neural Networks (CNNs) and Long-Short-Term Memory (LSTM) models. Results show that for our CNN-based approach using multimodal data (both BVP and EDA) gave better results than using just BVP (approx. 8.38% more) and EDA (approx 20.49% more). Our LSTM-based model too had better results when we used multimodal data (approx 8.38% more than just BVP and 6.70% more than just EDA). Overall, CNNs performed better than LSTMs for classifying physiologies for paper vs robot-based instruction by 7.72%. The CNN-based model was able to give better classification results (approximately 17.83% more on an average across all responses of the NASA TLX) within a few minutes of training compared to the LSTM-based models.
comment: Accepted to IEEE CASE 2025
Fall Prediction for Bipedal Robots: The Standing Phase
This paper presents a novel approach to fall prediction for bipedal robots, specifically targeting the detection of potential falls while standing caused by abrupt, incipient, and intermittent faults. Leveraging a 1D convolutional neural network (CNN), our method aims to maximize lead time for fall prediction while minimizing false positive rates. The proposed algorithm uniquely integrates the detection of various fault types and estimates the lead time for potential falls. Our contributions include the development of an algorithm capable of detecting abrupt, incipient, and intermittent faults in full-sized robots, its implementation using both simulation and hardware data for a humanoid robot, and a method for estimating lead time. Evaluation metrics, including false positive rate, lead time, and response time, demonstrate the efficacy of our approach. Particularly, our model achieves impressive lead times and response times across different fault scenarios with a false positive rate of 0. The findings of this study hold significant implications for enhancing the safety and reliability of bipedal robotic systems.
comment: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation CVPR 2025
Building a lifelong robot that can effectively leverage prior knowledge for continuous skill acquisition remains significantly challenging. Despite the success of experience replay and parameter-efficient methods in alleviating catastrophic forgetting problem, naively applying these methods causes a failure to leverage the shared primitives between skills. To tackle these issues, we propose Primitive Prompt Learning (PPL), to achieve lifelong robot manipulation via reusable and extensible primitives. Within our two stage learning scheme, we first learn a set of primitive prompts to represent shared primitives through multi-skills pre-training stage, where motion-aware prompts are learned to capture semantic and motion shared primitives across different skills. Secondly, when acquiring new skills in lifelong span, new prompts are appended and optimized with frozen pretrained prompts, boosting the learning via knowledge transfer from old skills to new ones. For evaluation, we construct a large-scale skill dataset and conduct extensive experiments in both simulation and real-world tasks, demonstrating PPL's superior performance over state-of-the-art methods.
comment: Accepted to CVPR 2025
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
The reward hypothesis states that all goals and purposes can be understood as the maximization of a received scalar reward signal. However, in practice, defining such a reward signal is notoriously difficult, as humans are often unable to predict the optimal behavior corresponding to a reward function. Natural language offers an intuitive alternative for instructing reinforcement learning (RL) agents, yet previous language-conditioned approaches either require costly supervision or test-time training given a language instruction. In this work, we present a new approach that uses a pretrained RL agent trained using only unlabeled, offline interactions--without task-specific supervision or labeled trajectories--to get zero-shot test-time policy inference from arbitrary natural language instructions. We introduce a framework comprising three steps: imagine, project, and imitate. First, the agent imagines a sequence of observations corresponding to the provided language description using video generative models. Next, these imagined observations are projected into the target environment domain. Finally, an agent pretrained in the target environment with unsupervised RL instantly imitates the projected observation sequence through a closed-form solution. To the best of our knowledge, our method, RLZero, is the first approach to show direct language-to-behavior generation abilities on a variety of tasks and environments without any in-domain supervision. We further show that components of RLZero can be used to generate policies zero-shot from cross-embodied videos, such as those available on YouTube, even for complex embodiments like humanoids.
comment: 26 pages
HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval
We hand the community HAND, a simple and time-efficient method for teaching robots new manipulation tasks through human hand demonstrations. Instead of relying on task-specific robot demonstrations collected via teleoperation, HAND uses easy-to-provide hand demonstrations to retrieve relevant behaviors from task-agnostic robot play data. Using a visual tracking pipeline, HAND extracts the motion of the human hand from the hand demonstration and retrieves robot sub-trajectories in two stages: first filtering by visual similarity, then retrieving trajectories with similar behaviors to the hand. Fine-tuning a policy on the retrieved data enables real-time learning of tasks in under four minutes, without requiring calibrated cameras or detailed hand pose estimation. Experiments also show that HAND outperforms retrieval baselines by over 2x in average task success rates on real robots. Videos can be found at our project website: https://liralab.usc.edu/handretrieval/.
VB-Com: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception
The performance of legged locomotion is closely tied to the accuracy and comprehensiveness of state observations. Blind policies, which rely solely on proprioception, are considered highly robust due to the reliability of proprioceptive observations. However, these policies significantly limit locomotion speed and often require collisions with the terrain to adapt. In contrast, Vision policies allows the robot to plan motions in advance and respond proactively to unstructured terrains with an online perception module. However, perception is often compromised by noisy real-world environments, potential sensor failures, and the limitations of current simulations in presenting dynamic or deformable terrains. Humanoid robots, with high degrees of freedom and inherently unstable morphology, are particularly susceptible to misguidance from deficient perception, which can result in falls or termination on challenging dynamic terrains. To leverage the advantages of both vision and blind policies, we propose VB-Com, a composite framework that enables humanoid robots to determine when to rely on the vision policy and when to switch to the blind policy under perceptual deficiency. We demonstrate that VB-Com effectively enables humanoid robots to traverse challenging terrains and obstacles despite perception deficiencies caused by dynamic terrains or perceptual noise.
Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC
Extreme cornering in racing often leads to large sideslip angles, presenting a significant challenge for vehicle control. Conventional vehicle controllers struggle to manage this scenario, necessitating the use of a drifting controller. However, the large sideslip angle in drift conditions introduces model mismatch, which in turn affects control precision. To address this issue, we propose a model correction drift controller that integrates Model Predictive Control (MPC) with Gaussian Process Regression (GPR). GPR is employed to correct vehicle model mismatches during both drift equilibrium solving and the MPC optimization process. Additionally, the variance from GPR is utilized to actively explore different cornering drifting velocities, aiming to minimize trajectory tracking errors. The proposed algorithm is validated through simulations on the Simulink-Carsim platform and experiments with a 1:10 scale RC vehicle. In the simulation, the average lateral error with GPR is reduced by 52.8% compared to the non-GPR case. Incorporating exploration further decreases this error by 27.1%. The velocity tracking Root Mean Square Error (RMSE) also decreases by 10.6% with exploration. In the RC car experiment, the average lateral error with GPR is 36.7% lower, and exploration further leads to a 29.0% reduction. Moreover, the velocity tracking RMSE decreases by 7.2% with the inclusion of exploration.
CogAD: Cognitive-Hierarchy Guided End-to-End Autonomous Driving
While end-to-end autonomous driving has advanced significantly, prevailing methods remain fundamentally misaligned with human cognitive principles in both perception and planning. In this paper, we propose CogAD, a novel end-to-end autonomous driving model that emulates the hierarchical cognition mechanisms of human drivers. CogAD implements dual hierarchical mechanisms: global-to-local context processing for human-like perception and intent-conditioned multi-mode trajectory generation for cognitively-inspired planning. The proposed method demonstrates three principal advantages: comprehensive environmental understanding through hierarchical perception, robust planning exploration enabled by multi-level planning, and diverse yet reasonable multi-modal trajectory generation facilitated by dual-level uncertainty modeling. Extensive experiments on nuScenes and Bench2Drive demonstrate that CogAD achieves state-of-the-art performance in end-to-end planning, exhibiting particular superiority in long-tail scenarios and robust generalization to complex real-world driving conditions.
Multiagent Systems
Robust and Safe Multi-Agent Reinforcement Learning Framework with Communication for Autonomous Vehicles
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for many multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance safety of the system. However, zero-shot transfer of simulator-trained MARL policies to hardware dynamic systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper introduces RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experiment results on F1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 19 pages, 9 Figures
Will Agents Replace Us? Perceptions of Autonomous Multi-Agent AI
Autonomous multi-agent AI systems are poised to transform various industries, particularly software development and knowledge work. Understanding current perceptions among professionals is crucial for anticipating adoption challenges, ethical considerations, and future workforce development. This study analyzes responses from 130 participants to a survey on the capabilities, impact, and governance of AI agents. We explore expected timelines for AI replacing programmers, identify perceived barriers to deployment, and examine beliefs about responsibility when agents make critical decisions. Key findings reveal three distinct clusters of respondents. While the study explored factors associated with current AI agent deployment, the initial logistic regression model did not yield statistically significant predictors, suggesting that deployment decisions are complex and may be influenced by factors not fully captured or that a larger sample is needed. These insights highlight the need for organizations to address compliance concerns (a commonly cited barrier) and establish clear governance frameworks as they integrate autonomous agents into their workflows.
comment: 15 pages, 5 figures, code available at https://github.com/nibzard/agent-perceptions
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent Collaboration
We introduce EvoGit, a decentralized multi-agent framework for collaborative software development driven by autonomous code evolution. EvoGit deploys a population of independent coding agents, each proposing edits to a shared codebase without centralized coordination, explicit message passing, or shared memory. Instead, all coordination emerges through a Git-based phylogenetic graph that tracks the full version lineage and enables agents to asynchronously read from and write to the evolving code repository. This graph-based structure supports fine-grained branching, implicit concurrency, and scalable agent interaction while preserving a consistent historical record. Human involvement is minimal but strategic: users define high-level goals, periodically review the graph, and provide lightweight feedback to promote promising directions or prune unproductive ones. Experiments demonstrate EvoGit's ability to autonomously produce functional and modular software artifacts across two real-world tasks: (1) building a web application from scratch using modern frameworks, and (2) constructing a meta-level system that evolves its own language-model-guided solver for the bin-packing optimization problem. Our results underscore EvoGit's potential to establish a new paradigm for decentralized, automated, and continual software development. EvoGit is open-sourced at https://github.com/BillHuang2001/evogit.
Improving Multi-Vehicle Perception Fusion with Millimeter-Wave Radar Assistance
Cooperative perception enables vehicles to share sensor readings and has become a new paradigm to improve driving safety, where the key enabling technology for realizing this vision is to real-time and accurately align and fuse the perceptions. Recent advances to align the views rely on high-density LiDAR data or fine-grained image feature representations, which however fail to meet the requirements of accuracy, real-time, and adaptability for autonomous driving. To this end, we present MMatch, a lightweight system that enables accurate and real-time perception fusion with mmWave radar point clouds. The key insight is that fine-grained spatial information provided by the radar present unique associations with all the vehicles even in two separate views. As a result, by capturing and understanding the unique local and global position of the targets in this association, we can quickly find out all the co-visible vehicles for view alignment. We implement MMatch on both the datasets collected from the CARLA platform and the real-world traffic with over 15,000 radar point cloud pairs. Experimental results show that MMatch achieves decimeter-level accuracy within 59ms, which significantly improves the reliability for autonomous driving.
comment: to appear in IEEE INFOCOM 2025
Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment AAAI-25
Existing work on the alignment problem has focused mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a monolith. Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents. We address this gap by adapting a computational social science model of human contention to the alignment problem. Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals across various problem areas. Misalignment scores in our framework depend on the observed agent population, the domain in question, and conflict between agents' weighted preferences. Through simulations, we demonstrate how our model captures intuitive aspects of misalignment across different scenarios. We then apply our model to two case studies, including an autonomous vehicle setting, showcasing its practical utility. Our approach offers enhanced explanatory power for complex sociotechnical environments and could inform the design of more aligned AI systems in real-world applications.
comment: 7 pages, 8 figures, 3 tables, forthcoming at the AAAI-25 Special Track on AI Alignment
Exploring Model Kinship for Merging Large Language Models
Model merging has become one of the key technologies for enhancing the capabilities and efficiency of Large Language Models (LLMs). However, our understanding of the expected performance gains and principles when merging any two models remains limited. In this work, we introduce model kinship, the degree of similarity or relatedness between LLMs, analogous to biological evolution. With comprehensive empirical analysis, we find that there is a certain relationship between model kinship and the performance gains after model merging, which can help guide our selection of candidate models. Inspired by this, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets. Specifically, we discover that using model kinship as a criterion can assist us in continuously performing model merging, alleviating the degradation (local optima) in model evolution, whereas model kinship can serve as a guide to escape these traps. Code is available at https://github.com/zjunlp/ModelKinship.
comment: Ongoing work
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement ACL 2025
In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments. Code is available at https://github.com/zjunlp/SynWorld.
comment: ACL 2025
Human-AI Governance (HAIG): A Trust-Utility Approach
This paper introduces the HAIG framework for analysing trust dynamics across evolving human-AI relationships. Current categorical frameworks (e.g., "human-in-the-loop" models) inadequately capture how AI systems evolve from tools to partners, particularly as foundation models demonstrate emergent capabilities and multi-agent systems exhibit autonomous goal-setting behaviours. As systems advance, agency redistributes in complex patterns that are better represented as positions along continua rather than discrete categories, though progression may include both gradual shifts and significant step changes. The HAIG framework operates across three levels: dimensions (Decision Authority Distribution, Process Autonomy, and Accountability Configuration), continua (gradual shifts along each dimension), and thresholds (critical points requiring governance adaptation). Unlike risk-based or principle-based approaches, HAIG adopts a trust-utility orientation, focusing on maintaining appropriate trust relationships that maximise utility while ensuring sufficient safeguards. Our analysis reveals how technical advances in self-supervision, reasoning authority, and distributed decision-making drive non-uniform trust evolution across both contextual variation and technological advancement. Case studies in healthcare and European regulation demonstrate how HAIG complements existing frameworks while offering a foundation for alternative approaches that anticipate governance challenges before they emerge.
comment: 32 pages including references and appendix, 25 pages core text, 3 figures, 3 tables
Systems and Control (CS)
Distributed perception of social power in influence networks with stubborn individuals
Social power quantifies the ability of individuals to influence others and plays a central role in social influence networks. Yet computing social power typically requires global knowledge and significant computational or storage capability, especially in large-scale networks with stubborn individuals. This paper develops distributed algorithms for social power perception in groups with stubborn individuals. We propose two dynamical models for distributed perception of social power based on the Friedkin-Johnsen (FJ) opinion dynamics: one without and one with reflected appraisals. In both scenarios, our perception mechanism begins with independent initial perceptions and relies primarily on local information: each individual only needs to know its neighbors' stubbornness or self-appraisals, the influence weights they accord and the group size. We provide rigorous dynamical system analysis to characterize the properties of equilibria, invariant sets and convergence. Conditions under which individuals' perceived social power converges to the actual social power are established. The proposed perception mechanism demonstrates strong robustness to reflected appraisals, irrational perceptions, and timescale variations. Numerical examples are provided to illustrate our results.
comment: 15 pages, 4 figures
The Fastest Known First-Order Method for Minimizing Twice Continuously Differentiable Smooth Strongly Convex Functions
We consider iterative gradient-based optimization algorithms applied to functions that are smooth and strongly convex. The fastest globally convergent algorithm for this class of functions is the Triple Momentum (TM) method. We show that if the objective function is also twice continuously differentiable, a new, faster algorithm emerges, which we call $C^2$-Momentum (C2M). We prove that C2M is globally convergent and that its worst-case convergence rate is strictly faster than that of TM, with no additional computational cost. We validate our theoretical findings with numerical examples, demonstrating that C2M outperforms TM when the objective function is twice continuously differentiable.
comment: 6 pages, accepted as a joint contribution to IEEE Control Systems Letters (L-CSS) and the 2025 American Control Conference (ACC)
Generative diffusion posterior sampling for informative likelihoods
Sequential Monte Carlo (SMC) methods have recently shown successful results for conditional sampling of generative diffusion models. In this paper we propose a new diffusion posterior SMC sampler achieving improved statistical efficiencies, particularly under outlier conditions or highly informative likelihoods. The key idea is to construct an observation path that correlates with the diffusion model and to design the sampler to leverage this correlation for more efficient sampling. Empirical results conclude the efficiency.
comment: Commemorative issue
Chaotic Noncoherent SWIPT in Multi-Functional RIS-Aided Systems
In this letter, we investigate the design of chaotic signal-based transmit waveforms in a multi-functional reconfigurable intelligent surface (MF-RIS)-aided set-up for simultaneous wireless information and power transfer. We propose a differential chaos shift keying-based MF-RIS-aided set-up, where the MF-RIS is partitioned into three non-overlapping surfaces. The elements of the first sub-surface perform energy harvesting (EH), which in turn, provide the required power to the other two sub-surfaces responsible for transmission and reflection of the incident signal. By considering a frequency selective scenario and a realistic EH model, we characterize the chaotic MF-RIS-aided system in terms of its EH performance and the associated bit error rate. Thereafter, we characterize the harvested energy-bit error rate trade-off and derive a lower bound on the number of elements required to operate in the EH mode. Accordingly, we propose novel transmit waveform designs to demonstrate the importance of the choice of appropriate system parameters in the context of achieving self-sustainability.
comment: This work has been submitted to an IEEE journal for possible publication
AEQUAM: Accelerating Quantum Algorithm Validation through FPGA-Based Emulation
This work presents AEQUAM (Area Efficient QUAntum eMulation), a toolchain that enables faster and more accessible quantum circuit verification. It consists of a compiler that translates OpenQASM 2.0 into RISC-like instructions, Cython software models for selecting number representations and simulating circuits, and a VHDL generator that produces RTL descriptions for FPGA-based hardware emulators. The architecture leverages a SIMD approach to parallelize computation and reduces complexity by exploiting the sparsity of quantum gate matrices. The VHDL generator allows customization of the number of emulated qubits and parallelization levels to meet user requirements. Synthesized on an Altera Cyclone 10LP FPGA with a 20-bit fixed-point representation and nearest-type approximation, the architecture demonstrates better scalability than other state-of-the-art emulators. Specifically, the emulator has been validated by exploiting the well consolidated benchmark of mqt bench framework.
RoboTwin: A Robotic Teleoperation Framework Using Digital Twins
Robotic surgery imposes a significant cognitive burden on the surgeon. This cognitive burden increases in the case of remote robotic surgeries due to latency between entities and thus might affect the quality of surgery. Here, the patient side and the surgeon side are geographically separated by hundreds to thousands of kilometres. Real-time teleoperation of robots requires strict latency bounds for control and feedback. We propose a dual digital twin (DT) framework and explain the simulation environment and teleoperation framework. Here, the doctor visually controls the locally available DT of the patient side and thus experiences minimum latency. The second digital twin serves two purposes. Firstly, it provides a layer of safety for operator-related mishaps, and secondly, it conveys the coordinates of known and unknown objects back to the operator's side digital twin. We show that teleoperation accuracy and user experience are enhanced with our approach. Experimental results using the NASA-TLX metric show that the quality of surgery is vastly improved with DT, perhaps due to reduced cognitive burden. The network data rate for identifying objects at the operator side is 25x lower than normal.
IAE Optimized PID Tuning with Phase Margin and Crossover Frequency Constraints
This paper presents PMwc-Tune, a novel PID tuning method that uniquely combines frequency-domain robustness constraints with time-domain performance optimization through constrained nonlinear programming. The key contribution is a unified formulation that simultaneously enforces phase margin and crossover frequency requirements (via nonlinear equality constraints) while minimizing the Integral Absolute Error (IAE) of the closed-loop response. The algorithm employs Sequential Quadratic Programming (SQP) to solve this constrained optimization problem, guaranteeing specification attainment within numerical tolerances while optimizing transient performance. Numerical validation on benchmark systems demonstrates precise convergence to design targets (phase margin and crossover frequency errors <1%) with a 4.6% IAE reduction compared to MATLAB's pidtune. The open-source implementation provides both methodological transparency and practical design flexibility, enabling PID controllers that rigorously balance frequency-domain robustness and time-domain performance.
comment: 2 figures, 1 table
HMPC-assisted Adversarial Inverse Reinforcement Learning for Smart Home Energy Management
This letter proposes an Adversarial Inverse Reinforcement Learning (AIRL)-based energy management method for a smart home, which incorporates an implicit thermal dynamics model. In the proposed method, historical optimal decisions are first generated using a neural network-assisted Hierarchical Model Predictive Control (HMPC) framework. These decisions are then used as expert demonstrations in the AIRL module, which aims to train a discriminator to distinguish expert demonstrations from transitions generated by a reinforcement learning agent policy, while simultaneously updating the agent policy that can produce transitions to confuse the discriminator. The proposed HMPC-AIRL method eliminates the need for explicit thermal dynamics models, prior or predictive knowledge of uncertain parameters, or manually designed reward functions. Simulation results based on real-world traces demonstrate the effectiveness and data efficiency of the proposed method.
comment: 6 pages, 8 figures
Interpretable Spatio-Temporal Features Extraction based Industrial Process Modeling and Monitoring by Soft Sensor
Data-driven soft sensors have been widely applied in complex industrial processes. However, the interpretable spatio-temporal features extraction by soft sensors remains a challenge. In this light, this work introduces a novel method termed spatio-temporal consistent and interpretable model (STCIM). First, temporal and spatial features are captured and aligned by a far topological spatio-temporal consistency extraction block. Then, the features are mapped into an interpretable latent space for further prediction by explicitly giving physical meanings to latent variables. The efficacy of the proposed STCIM is demonstrated through the modeling of two generated datasets and a real-life dataset of coal-fired power plants. The corresponding experiments show: 1) The generalization of STCIM outperforms other methods, especially in different operation situations. 2) The far topological spatio-temporal consistency is vital for feature alignment. 3) The hyper-parameters of physics-informed interpretable latent space loss decide the performance of STCIM.
Conceal Truth while Show Fake: T/F Frequency Multiplexing based Anti-Intercepting Transmission
In wireless communication adversarial scenarios, signals are easily intercepted by non-cooperative parties, exposing the transmission of confidential information. This paper proposes a true-and-false (T/F) frequency multiplexing based anti-intercepting transmission scheme capable of concealing truth while showing fake (CTSF), integrating both offensive and defensive strategies. Specifically, through multi-source cooperation, true and false signals are transmitted over multiple frequency bands using non-orthogonal frequency division multiplexing. The decoy signals are used to deceive non-cooperative eavesdropper, while the true signals are hidden to counter interception threats. Definitions for the interception and deception probabilities are provided, and the mechanism of CTSF is discussed. To improve the secrecy performance of true signals while ensuring decoy signals achieve their deceptive purpose, we model the problem as maximizing the sum secrecy rate of true signals, with constraint on the decoy effect. Furthermore, we propose a bi-stage alternating dual-domain optimization approach for joint optimization of both power allocation and correlation coefficients among multiple sources, and a Newton's method is proposed for fitting the T/F frequency multiplexing factor. In addition, simulation results verify the efficiency of anti-intercepting performance of our proposed CTSF scheme.
Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning
Action-dependent individual policies, which incorporate both environmental states and the actions of other agents in decision-making, have emerged as a promising paradigm for achieving global optimality in multi-agent reinforcement learning (MARL). However, the existing literature often adopts auto-regressive action-dependent policies, where each agent's policy depends on the actions of all preceding agents. This formulation incurs substantial computational complexity as the number of agents increases, thereby limiting scalability. In this work, we consider a more generalized class of action-dependent policies, which do not necessarily follow the auto-regressive form. We propose to use the `action dependency graph (ADG)' to model the inter-agent action dependencies. Within the context of MARL problems structured by coordination graphs, we prove that an action-dependent policy with a sparse ADG can achieve global optimality, provided the ADG satisfies specific conditions specified by the coordination graph. Building on this theoretical foundation, we develop a tabular policy iteration algorithm with guaranteed global optimality. Furthermore, we integrate our framework into several SOTA algorithms and conduct experiments in complex environments. The empirical results affirm the robustness and applicability of our approach in more general scenarios, underscoring its potential for broader MARL challenges.
OODTE: A Differential Testing Engine for the ONNX Optimizer
With over 700 stars on GitHub and being part of the official ONNX repository, the ONNX Optimizer is the default tool for applying graph-based optimizations to ONNX models. Despite its widespread use, its ability to maintain model accuracy during optimization has not been thoroughly investigated. In this work, we present OODTE, a utility designed to automatically and comprehensively evaluate the correctness of the ONNX Optimizer. OODTE adopts a straightforward yet powerful differential testing and evaluation methodology, which can be readily adapted for use with other compiler optimizers. Specifically, OODTE takes a collection of ONNX models, applies optimizations, and executes both the original and optimized versions across a user-defined input set, automatically capturing any issues encountered during optimization. When discrepancies in accuracy arise, OODTE iteratively isolates the responsible optimization pass by repeating the process at a finer granularity. We applied OODTE to 130 well-known models from the official ONNX Model Hub, spanning diverse tasks including classification, object detection, semantic segmentation, text summarization, question answering, and sentiment analysis. Our evaluation revealed that 9.2% of the model instances either caused the optimizer to crash or led to the generation of invalid models using default optimization strategies. Additionally, 30% of classification models and 16.6% of object detection and segmentation models exhibited differing outputs across original and optimized versions, whereas models focused on text-related tasks were generally robust to optimization. OODTE uncovered 15 issues-14 previously unknown-affecting 9 of 47 optimization passes and the optimizer overall. All issues were reported to the ONNX Optimizer team. OODTE offers a simple but effective framework for validating AI model optimizers, applicable beyond the ONNX ecosystem.
comment: 12 pages, 3 figures, 3 tables
Generalized hybrid momentum maps and reduction by symmetries of forced mechanical systems with inelastic collisions
This paper discusses reduction by symmetries for autonomous and non-autonomous forced mechanical systems with inelastic collisions. In particular, we introduce the notion of generalized hybrid momentum map and hybrid constants of the motion to give general conditions on whether it is possible to perform symmetry reduction for Hamiltonian and Lagrangian systems subject to non-conservative external forces and non-elastic impacts, as well as its extension to time-dependent mechanical systems subject to time-dependent external forces and time-dependent inelastic collisions. We illustrate the applicability of the method with examples and numerical simulations.
comment: 29 pages, 4 figures. Accepted in J. Math. Phys
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
Recently, Kvalheim and Sontag provided a generalized global Hartman-Grobman theorem for equilibria under asymptotically stable continuous vector fields. By leveraging topological properties of Lyapunov functions, their theorem works without assuming hyperbolicity. We extend their theorem to a class of possibly discontinuous vector fields, in particular, to vector fields generating asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277, 4 pages, comments are welcome. V3 contains more details
Minimum Radiative Heat and Propellant Aerocapture Guidance with Attitude Kinematics Constraints
Aerocapture leverages atmospheric drag to convert a spacecraft's hyperbolic trajectory into a bound orbit. For some aerocapture missions, heating due to the radiation of high temperature gases in the shock-layer can be much larger than the heat due to convection. This paper provides analytical proof and numerical validation that radiative heat load is minimized by the same trajectory that minimizes the final {\Delta} V: a single switch bang-bang trajectory, starting with lift up. The proof is very general and is valid for several formulations of radiative heat flux; further, the same proof can be used to conclude that convective heat load, computed according to many of the available formulations, is instead maximized by that trajectory. Further, a novel guidance that plans a bang-bang trajectory with constraints in the attitude kinematics is introduced. While achieving performance similar to that of the current state-of-the-art, the inclusion of constraints in attitude kinematics allows for much less tuning. Finally, a lateral guidance that makes use of information on the final inclination of the predicted trajectory is introduced. Such guidance allows for very high accuracy in the inclination requirements with only two reversals, by requiring a single parameter to be tuned.
comment: Accepted to publish in the Journal of Guidance, Control, and Dynamics
Verifiable Mission Planning For Space Operations
As space missions become more complex, planning methods must maximize mission performance while rigorously enforcing safety. We develop a probabilistic approach based on a finite-horizon Markov decision process to optimize spacecraft operations planning with safety guarantees. In the model, states capture essential mission parameters, and actions represent the operational adjustments needed to meet mission objectives. By directly incorporating uncertainties from environmental conditions and spacecraft dynamics, an optimal sequence of actions is computed that maximizes expected rewards and strictly enforces safety constraints. Numerical experiments on the GRACE-FO mission demonstrate robust performance under uncertainties while providing probabilistic safety guarantees, offering a reliable solution for autonomous spacecraft operations.
comment: Initial submission to the 2025 AAS/AIAA Astrodynamics Specialist Conference
Online Control of Linear Systems under Unbounded Noise
This paper investigates the problem of controlling a linear system under possibly unbounded stochastic noise with unknown convex cost functions, known as an online control problem. In contrast to the existing work, which assumes the boundedness of noise, we show that an $ \tilde{O}(\sqrt{T}) $ high-probability regret can be achieved under unbounded noise, where $ T $ denotes the time horizon. Notably, the noise is only required to have a finite fourth moment. Moreover, when the costs are strongly convex and the noise is sub-Gaussian, we establish an $ O({\rm poly} (\log T)) $ regret bound.
comment: 41 pages
Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC
Extreme cornering in racing often leads to large sideslip angles, presenting a significant challenge for vehicle control. Conventional vehicle controllers struggle to manage this scenario, necessitating the use of a drifting controller. However, the large sideslip angle in drift conditions introduces model mismatch, which in turn affects control precision. To address this issue, we propose a model correction drift controller that integrates Model Predictive Control (MPC) with Gaussian Process Regression (GPR). GPR is employed to correct vehicle model mismatches during both drift equilibrium solving and the MPC optimization process. Additionally, the variance from GPR is utilized to actively explore different cornering drifting velocities, aiming to minimize trajectory tracking errors. The proposed algorithm is validated through simulations on the Simulink-Carsim platform and experiments with a 1:10 scale RC vehicle. In the simulation, the average lateral error with GPR is reduced by 52.8% compared to the non-GPR case. Incorporating exploration further decreases this error by 27.1%. The velocity tracking Root Mean Square Error (RMSE) also decreases by 10.6% with exploration. In the RC car experiment, the average lateral error with GPR is 36.7% lower, and exploration further leads to a 29.0% reduction. Moreover, the velocity tracking RMSE decreases by 7.2% with the inclusion of exploration.
Distances between finite-horizon linear behaviors
The paper introduces a class of distances for linear behaviors over finite time horizons. These distances allow for comparisons between finite-horizon linear behaviors represented by matrices of possibly different dimensions. They remain invariant under coordinate changes, rotations, and permutations, ensuring independence from input-output partitions. Moreover, they naturally encode complexity-misfit trade-offs for Linear Time-Invariant (LTI) behaviors, providing a principled solution to a longstanding puzzle in behavioral systems theory. The resulting framework characterizes modeling as a minimum distance problem, identifying the Most Powerful Unfalsified Model (MPUM) as optimal among all systems unfalsified by a given dataset. Finally, we illustrate the value of these metrics in a time series anomaly detection task, where their finer resolution yields superior performance over existing distances.
comment: IEEE Control Systems Letters / 64th IEEE Conference on Decision and Control
Finite Sample Analysis of Tensor Decomposition for Learning Mixtures of Linear Systems
We study the problem of learning mixtures of linear dynamical systems (MLDS) from input-output data. The mixture setting allows us to leverage observations from related dynamical systems to improve the estimation of individual models. Building on spectral methods for mixtures of linear regressions, we propose a moment-based estimator that uses tensor decomposition to estimate the impulse response parameters of the mixture models. The estimator improves upon existing tensor decomposition approaches for MLDS by utilizing the entire length of the observed trajectories. We provide sample complexity bounds for estimating MLDS in the presence of noise, in terms of both the number of trajectories $N$ and the trajectory length $T$, and demonstrate the performance of the estimator through simulations.
comment: Accepted to 2025 Learning for Dynamics & Control Conference (L4DC) 2025. This is the full version. 38 pages
Systems and Control (EESS)
Distributed perception of social power in influence networks with stubborn individuals
Social power quantifies the ability of individuals to influence others and plays a central role in social influence networks. Yet computing social power typically requires global knowledge and significant computational or storage capability, especially in large-scale networks with stubborn individuals. This paper develops distributed algorithms for social power perception in groups with stubborn individuals. We propose two dynamical models for distributed perception of social power based on the Friedkin-Johnsen (FJ) opinion dynamics: one without and one with reflected appraisals. In both scenarios, our perception mechanism begins with independent initial perceptions and relies primarily on local information: each individual only needs to know its neighbors' stubbornness or self-appraisals, the influence weights they accord and the group size. We provide rigorous dynamical system analysis to characterize the properties of equilibria, invariant sets and convergence. Conditions under which individuals' perceived social power converges to the actual social power are established. The proposed perception mechanism demonstrates strong robustness to reflected appraisals, irrational perceptions, and timescale variations. Numerical examples are provided to illustrate our results.
comment: 15 pages, 4 figures
The Fastest Known First-Order Method for Minimizing Twice Continuously Differentiable Smooth Strongly Convex Functions
We consider iterative gradient-based optimization algorithms applied to functions that are smooth and strongly convex. The fastest globally convergent algorithm for this class of functions is the Triple Momentum (TM) method. We show that if the objective function is also twice continuously differentiable, a new, faster algorithm emerges, which we call $C^2$-Momentum (C2M). We prove that C2M is globally convergent and that its worst-case convergence rate is strictly faster than that of TM, with no additional computational cost. We validate our theoretical findings with numerical examples, demonstrating that C2M outperforms TM when the objective function is twice continuously differentiable.
comment: 6 pages, accepted as a joint contribution to IEEE Control Systems Letters (L-CSS) and the 2025 American Control Conference (ACC)
Generative diffusion posterior sampling for informative likelihoods
Sequential Monte Carlo (SMC) methods have recently shown successful results for conditional sampling of generative diffusion models. In this paper we propose a new diffusion posterior SMC sampler achieving improved statistical efficiencies, particularly under outlier conditions or highly informative likelihoods. The key idea is to construct an observation path that correlates with the diffusion model and to design the sampler to leverage this correlation for more efficient sampling. Empirical results conclude the efficiency.
comment: Commemorative issue
Chaotic Noncoherent SWIPT in Multi-Functional RIS-Aided Systems
In this letter, we investigate the design of chaotic signal-based transmit waveforms in a multi-functional reconfigurable intelligent surface (MF-RIS)-aided set-up for simultaneous wireless information and power transfer. We propose a differential chaos shift keying-based MF-RIS-aided set-up, where the MF-RIS is partitioned into three non-overlapping surfaces. The elements of the first sub-surface perform energy harvesting (EH), which in turn, provide the required power to the other two sub-surfaces responsible for transmission and reflection of the incident signal. By considering a frequency selective scenario and a realistic EH model, we characterize the chaotic MF-RIS-aided system in terms of its EH performance and the associated bit error rate. Thereafter, we characterize the harvested energy-bit error rate trade-off and derive a lower bound on the number of elements required to operate in the EH mode. Accordingly, we propose novel transmit waveform designs to demonstrate the importance of the choice of appropriate system parameters in the context of achieving self-sustainability.
comment: This work has been submitted to an IEEE journal for possible publication
AEQUAM: Accelerating Quantum Algorithm Validation through FPGA-Based Emulation
This work presents AEQUAM (Area Efficient QUAntum eMulation), a toolchain that enables faster and more accessible quantum circuit verification. It consists of a compiler that translates OpenQASM 2.0 into RISC-like instructions, Cython software models for selecting number representations and simulating circuits, and a VHDL generator that produces RTL descriptions for FPGA-based hardware emulators. The architecture leverages a SIMD approach to parallelize computation and reduces complexity by exploiting the sparsity of quantum gate matrices. The VHDL generator allows customization of the number of emulated qubits and parallelization levels to meet user requirements. Synthesized on an Altera Cyclone 10LP FPGA with a 20-bit fixed-point representation and nearest-type approximation, the architecture demonstrates better scalability than other state-of-the-art emulators. Specifically, the emulator has been validated by exploiting the well consolidated benchmark of mqt bench framework.
RoboTwin: A Robotic Teleoperation Framework Using Digital Twins
Robotic surgery imposes a significant cognitive burden on the surgeon. This cognitive burden increases in the case of remote robotic surgeries due to latency between entities and thus might affect the quality of surgery. Here, the patient side and the surgeon side are geographically separated by hundreds to thousands of kilometres. Real-time teleoperation of robots requires strict latency bounds for control and feedback. We propose a dual digital twin (DT) framework and explain the simulation environment and teleoperation framework. Here, the doctor visually controls the locally available DT of the patient side and thus experiences minimum latency. The second digital twin serves two purposes. Firstly, it provides a layer of safety for operator-related mishaps, and secondly, it conveys the coordinates of known and unknown objects back to the operator's side digital twin. We show that teleoperation accuracy and user experience are enhanced with our approach. Experimental results using the NASA-TLX metric show that the quality of surgery is vastly improved with DT, perhaps due to reduced cognitive burden. The network data rate for identifying objects at the operator side is 25x lower than normal.
IAE Optimized PID Tuning with Phase Margin and Crossover Frequency Constraints
This paper presents PMwc-Tune, a novel PID tuning method that uniquely combines frequency-domain robustness constraints with time-domain performance optimization through constrained nonlinear programming. The key contribution is a unified formulation that simultaneously enforces phase margin and crossover frequency requirements (via nonlinear equality constraints) while minimizing the Integral Absolute Error (IAE) of the closed-loop response. The algorithm employs Sequential Quadratic Programming (SQP) to solve this constrained optimization problem, guaranteeing specification attainment within numerical tolerances while optimizing transient performance. Numerical validation on benchmark systems demonstrates precise convergence to design targets (phase margin and crossover frequency errors <1%) with a 4.6% IAE reduction compared to MATLAB's pidtune. The open-source implementation provides both methodological transparency and practical design flexibility, enabling PID controllers that rigorously balance frequency-domain robustness and time-domain performance.
comment: 2 figures, 1 table
HMPC-assisted Adversarial Inverse Reinforcement Learning for Smart Home Energy Management
This letter proposes an Adversarial Inverse Reinforcement Learning (AIRL)-based energy management method for a smart home, which incorporates an implicit thermal dynamics model. In the proposed method, historical optimal decisions are first generated using a neural network-assisted Hierarchical Model Predictive Control (HMPC) framework. These decisions are then used as expert demonstrations in the AIRL module, which aims to train a discriminator to distinguish expert demonstrations from transitions generated by a reinforcement learning agent policy, while simultaneously updating the agent policy that can produce transitions to confuse the discriminator. The proposed HMPC-AIRL method eliminates the need for explicit thermal dynamics models, prior or predictive knowledge of uncertain parameters, or manually designed reward functions. Simulation results based on real-world traces demonstrate the effectiveness and data efficiency of the proposed method.
comment: 6 pages, 8 figures
Interpretable Spatio-Temporal Features Extraction based Industrial Process Modeling and Monitoring by Soft Sensor
Data-driven soft sensors have been widely applied in complex industrial processes. However, the interpretable spatio-temporal features extraction by soft sensors remains a challenge. In this light, this work introduces a novel method termed spatio-temporal consistent and interpretable model (STCIM). First, temporal and spatial features are captured and aligned by a far topological spatio-temporal consistency extraction block. Then, the features are mapped into an interpretable latent space for further prediction by explicitly giving physical meanings to latent variables. The efficacy of the proposed STCIM is demonstrated through the modeling of two generated datasets and a real-life dataset of coal-fired power plants. The corresponding experiments show: 1) The generalization of STCIM outperforms other methods, especially in different operation situations. 2) The far topological spatio-temporal consistency is vital for feature alignment. 3) The hyper-parameters of physics-informed interpretable latent space loss decide the performance of STCIM.
Conceal Truth while Show Fake: T/F Frequency Multiplexing based Anti-Intercepting Transmission
In wireless communication adversarial scenarios, signals are easily intercepted by non-cooperative parties, exposing the transmission of confidential information. This paper proposes a true-and-false (T/F) frequency multiplexing based anti-intercepting transmission scheme capable of concealing truth while showing fake (CTSF), integrating both offensive and defensive strategies. Specifically, through multi-source cooperation, true and false signals are transmitted over multiple frequency bands using non-orthogonal frequency division multiplexing. The decoy signals are used to deceive non-cooperative eavesdropper, while the true signals are hidden to counter interception threats. Definitions for the interception and deception probabilities are provided, and the mechanism of CTSF is discussed. To improve the secrecy performance of true signals while ensuring decoy signals achieve their deceptive purpose, we model the problem as maximizing the sum secrecy rate of true signals, with constraint on the decoy effect. Furthermore, we propose a bi-stage alternating dual-domain optimization approach for joint optimization of both power allocation and correlation coefficients among multiple sources, and a Newton's method is proposed for fitting the T/F frequency multiplexing factor. In addition, simulation results verify the efficiency of anti-intercepting performance of our proposed CTSF scheme.
Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning
Action-dependent individual policies, which incorporate both environmental states and the actions of other agents in decision-making, have emerged as a promising paradigm for achieving global optimality in multi-agent reinforcement learning (MARL). However, the existing literature often adopts auto-regressive action-dependent policies, where each agent's policy depends on the actions of all preceding agents. This formulation incurs substantial computational complexity as the number of agents increases, thereby limiting scalability. In this work, we consider a more generalized class of action-dependent policies, which do not necessarily follow the auto-regressive form. We propose to use the `action dependency graph (ADG)' to model the inter-agent action dependencies. Within the context of MARL problems structured by coordination graphs, we prove that an action-dependent policy with a sparse ADG can achieve global optimality, provided the ADG satisfies specific conditions specified by the coordination graph. Building on this theoretical foundation, we develop a tabular policy iteration algorithm with guaranteed global optimality. Furthermore, we integrate our framework into several SOTA algorithms and conduct experiments in complex environments. The empirical results affirm the robustness and applicability of our approach in more general scenarios, underscoring its potential for broader MARL challenges.
OODTE: A Differential Testing Engine for the ONNX Optimizer
With over 700 stars on GitHub and being part of the official ONNX repository, the ONNX Optimizer is the default tool for applying graph-based optimizations to ONNX models. Despite its widespread use, its ability to maintain model accuracy during optimization has not been thoroughly investigated. In this work, we present OODTE, a utility designed to automatically and comprehensively evaluate the correctness of the ONNX Optimizer. OODTE adopts a straightforward yet powerful differential testing and evaluation methodology, which can be readily adapted for use with other compiler optimizers. Specifically, OODTE takes a collection of ONNX models, applies optimizations, and executes both the original and optimized versions across a user-defined input set, automatically capturing any issues encountered during optimization. When discrepancies in accuracy arise, OODTE iteratively isolates the responsible optimization pass by repeating the process at a finer granularity. We applied OODTE to 130 well-known models from the official ONNX Model Hub, spanning diverse tasks including classification, object detection, semantic segmentation, text summarization, question answering, and sentiment analysis. Our evaluation revealed that 9.2% of the model instances either caused the optimizer to crash or led to the generation of invalid models using default optimization strategies. Additionally, 30% of classification models and 16.6% of object detection and segmentation models exhibited differing outputs across original and optimized versions, whereas models focused on text-related tasks were generally robust to optimization. OODTE uncovered 15 issues-14 previously unknown-affecting 9 of 47 optimization passes and the optimizer overall. All issues were reported to the ONNX Optimizer team. OODTE offers a simple but effective framework for validating AI model optimizers, applicable beyond the ONNX ecosystem.
comment: 12 pages, 3 figures, 3 tables
Generalized hybrid momentum maps and reduction by symmetries of forced mechanical systems with inelastic collisions
This paper discusses reduction by symmetries for autonomous and non-autonomous forced mechanical systems with inelastic collisions. In particular, we introduce the notion of generalized hybrid momentum map and hybrid constants of the motion to give general conditions on whether it is possible to perform symmetry reduction for Hamiltonian and Lagrangian systems subject to non-conservative external forces and non-elastic impacts, as well as its extension to time-dependent mechanical systems subject to time-dependent external forces and time-dependent inelastic collisions. We illustrate the applicability of the method with examples and numerical simulations.
comment: 29 pages, 4 figures. Accepted in J. Math. Phys
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
Recently, Kvalheim and Sontag provided a generalized global Hartman-Grobman theorem for equilibria under asymptotically stable continuous vector fields. By leveraging topological properties of Lyapunov functions, their theorem works without assuming hyperbolicity. We extend their theorem to a class of possibly discontinuous vector fields, in particular, to vector fields generating asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277, 4 pages, comments are welcome. V3 contains more details
Minimum Radiative Heat and Propellant Aerocapture Guidance with Attitude Kinematics Constraints
Aerocapture leverages atmospheric drag to convert a spacecraft's hyperbolic trajectory into a bound orbit. For some aerocapture missions, heating due to the radiation of high temperature gases in the shock-layer can be much larger than the heat due to convection. This paper provides analytical proof and numerical validation that radiative heat load is minimized by the same trajectory that minimizes the final {\Delta} V: a single switch bang-bang trajectory, starting with lift up. The proof is very general and is valid for several formulations of radiative heat flux; further, the same proof can be used to conclude that convective heat load, computed according to many of the available formulations, is instead maximized by that trajectory. Further, a novel guidance that plans a bang-bang trajectory with constraints in the attitude kinematics is introduced. While achieving performance similar to that of the current state-of-the-art, the inclusion of constraints in attitude kinematics allows for much less tuning. Finally, a lateral guidance that makes use of information on the final inclination of the predicted trajectory is introduced. Such guidance allows for very high accuracy in the inclination requirements with only two reversals, by requiring a single parameter to be tuned.
comment: Accepted to publish in the Journal of Guidance, Control, and Dynamics
Verifiable Mission Planning For Space Operations
As space missions become more complex, planning methods must maximize mission performance while rigorously enforcing safety. We develop a probabilistic approach based on a finite-horizon Markov decision process to optimize spacecraft operations planning with safety guarantees. In the model, states capture essential mission parameters, and actions represent the operational adjustments needed to meet mission objectives. By directly incorporating uncertainties from environmental conditions and spacecraft dynamics, an optimal sequence of actions is computed that maximizes expected rewards and strictly enforces safety constraints. Numerical experiments on the GRACE-FO mission demonstrate robust performance under uncertainties while providing probabilistic safety guarantees, offering a reliable solution for autonomous spacecraft operations.
comment: Initial submission to the 2025 AAS/AIAA Astrodynamics Specialist Conference
Online Control of Linear Systems under Unbounded Noise
This paper investigates the problem of controlling a linear system under possibly unbounded stochastic noise with unknown convex cost functions, known as an online control problem. In contrast to the existing work, which assumes the boundedness of noise, we show that an $ \tilde{O}(\sqrt{T}) $ high-probability regret can be achieved under unbounded noise, where $ T $ denotes the time horizon. Notably, the noise is only required to have a finite fourth moment. Moreover, when the costs are strongly convex and the noise is sub-Gaussian, we establish an $ O({\rm poly} (\log T)) $ regret bound.
comment: 41 pages
Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC
Extreme cornering in racing often leads to large sideslip angles, presenting a significant challenge for vehicle control. Conventional vehicle controllers struggle to manage this scenario, necessitating the use of a drifting controller. However, the large sideslip angle in drift conditions introduces model mismatch, which in turn affects control precision. To address this issue, we propose a model correction drift controller that integrates Model Predictive Control (MPC) with Gaussian Process Regression (GPR). GPR is employed to correct vehicle model mismatches during both drift equilibrium solving and the MPC optimization process. Additionally, the variance from GPR is utilized to actively explore different cornering drifting velocities, aiming to minimize trajectory tracking errors. The proposed algorithm is validated through simulations on the Simulink-Carsim platform and experiments with a 1:10 scale RC vehicle. In the simulation, the average lateral error with GPR is reduced by 52.8% compared to the non-GPR case. Incorporating exploration further decreases this error by 27.1%. The velocity tracking Root Mean Square Error (RMSE) also decreases by 10.6% with exploration. In the RC car experiment, the average lateral error with GPR is 36.7% lower, and exploration further leads to a 29.0% reduction. Moreover, the velocity tracking RMSE decreases by 7.2% with the inclusion of exploration.
Distances between finite-horizon linear behaviors
The paper introduces a class of distances for linear behaviors over finite time horizons. These distances allow for comparisons between finite-horizon linear behaviors represented by matrices of possibly different dimensions. They remain invariant under coordinate changes, rotations, and permutations, ensuring independence from input-output partitions. Moreover, they naturally encode complexity-misfit trade-offs for Linear Time-Invariant (LTI) behaviors, providing a principled solution to a longstanding puzzle in behavioral systems theory. The resulting framework characterizes modeling as a minimum distance problem, identifying the Most Powerful Unfalsified Model (MPUM) as optimal among all systems unfalsified by a given dataset. Finally, we illustrate the value of these metrics in a time series anomaly detection task, where their finer resolution yields superior performance over existing distances.
comment: IEEE Control Systems Letters / 64th IEEE Conference on Decision and Control
Finite Sample Analysis of Tensor Decomposition for Learning Mixtures of Linear Systems
We study the problem of learning mixtures of linear dynamical systems (MLDS) from input-output data. The mixture setting allows us to leverage observations from related dynamical systems to improve the estimation of individual models. Building on spectral methods for mixtures of linear regressions, we propose a moment-based estimator that uses tensor decomposition to estimate the impulse response parameters of the mixture models. The estimator improves upon existing tensor decomposition approaches for MLDS by utilizing the entire length of the observed trajectories. We provide sample complexity bounds for estimating MLDS in the presence of noise, in terms of both the number of trajectories $N$ and the trajectory length $T$, and demonstrate the performance of the estimator through simulations.
comment: Accepted to 2025 Learning for Dynamics & Control Conference (L4DC) 2025. This is the full version. 38 pages
Robotics
Adaptive Traffic-Following Scheme for Orderly Distributed Control of Multi-Vehicle Systems
We present an adaptive control scheme to enable the emergence of order within distributed, autonomous multi-agent systems. Past studies showed that under high-density conditions, order generated from traffic-following behavior reduces travel times, while under low densities, choosing direct paths is more beneficial. In this paper, we leveraged those findings to allow aircraft to independently and dynamically adjust their degree of traffic-following behavior based on the current state of the airspace. This enables aircraft to follow other traffic only when beneficial. Quantitative analyses revealed that dynamic traffic-following behavior results in lower aircraft travel times at the cost of minimal levels of additional disorder to the airspace. The sensitivity of these benefits to temporal and spatial horizons was also investigated. Overall, this work highlights the benefits, and potential necessity, of incorporating self-organizing behavior in making distributed, autonomous multi-agent systems scalable.
AWML: An Open-Source ML-based Robotics Perception Framework to Deploy for ROS-based Autonomous Driving Software
In recent years, machine learning technologies have played an important role in robotics, particularly in the development of autonomous robots and self-driving vehicles. As the industry matures, robotics frameworks like ROS 2 have been developed and provides a broad range of applications from research to production. In this work, we introduce AWML, a framework designed to support MLOps for robotics. AWML provides a machine learning infrastructure for autonomous driving, supporting not only the deployment of trained models to robotic systems, but also an active learning pipeline that incorporates auto-labeling, semi-auto-labeling, and data mining techniques.
comment: 17 pages, 9 figures
Evaluating Robot Policies in a World Model
Robotics has broad applications from automating house chores to taking care of patients. However, evaluating robot control policies is challenging, as real-world testing is expensive, while handcrafted simulations often fail to accurately reflect real-world conditions, resulting in poor correlation between simulated evaluation and real-world outcomes. In this work, we investigate World-model-based Policy Evaluation (WPE). We first train an action-conditioned video generation model as a proxy to real-world environments. To enable efficient rollouts of hundreds of interactive steps while mitigating error accumulation in the world model, we propose an inference scheme which we call Blockwise-Autoregressive Diffusion Transformer with adjustable context and decoding horizon lengths. To ensure that the world model indeed follows action input, we propose metrics based on the agreement between the ground truth video and generated video conditioned on the same sequence of actions to evaluate the world model. We then use the world model for policy evaluation by performing Monte Carlo rollouts in the world model while employing a vision-language model (VLM) as a reward function. Interestingly, we found that WPE tends to underestimate the policy values for in-distribution actions and overestimate policy values for out-of-distribution actions. Nevertheless, WPE preserves the relative rankings of different policies. In emulating real robot executions, WPE achieves high fidelity in mimicing robot arm movements as in real videos, while emulating highly realistic object interaction remains challenging. Despite this limitation, we show that a world model can serve as a starting point for evaluating robot policies before real-world deployment.
comment: https://world-model-eval.github.io
Constrained Stein Variational Gradient Descent for Robot Perception, Planning, and Identification
Many core problems in robotics can be framed as constrained optimization problems. Often on these problems, the robotic system has uncertainty, or it would be advantageous to identify multiple high quality feasible solutions. To enable this, we present two novel frameworks for applying principles of constrained optimization to the new variational inference algorithm Stein variational gradient descent. Our general framework supports multiple types of constrained optimizers and can handle arbitrary constraints. We demonstrate on a variety of problems that we are able to learn to approximate distributions without violating constraints. Specifically, we show that we can build distributions of: robot motion plans that exactly avoid collisions, robot arm joint angles on the SE(3) manifold with exact table placement constraints, and object poses from point clouds with table placement constraints.
Using Diffusion Ensembles to Estimate Uncertainty for End-to-End Autonomous Driving
End-to-end planning systems for autonomous driving are improving rapidly, especially in closed-loop simulation environments like CARLA. Many such driving systems either do not consider uncertainty as part of the plan itself, or obtain it by using specialized representations that do not generalize. In this paper, we propose EnDfuser, an end-to-end driving system that uses a diffusion model as the trajectory planner. EnDfuser effectively leverages complex perception information like fused camera and LiDAR features, through combining attention pooling and trajectory planning into a single diffusion transformer module. Instead of committing to a single plan, EnDfuser produces a distribution of candidate trajectories (128 for our case) from a single perception frame through ensemble diffusion. By observing the full set of candidate trajectories, EnDfuser provides interpretability for uncertain, multi-modal future trajectory spaces, where there are multiple plausible options. EnDfuser achieves a competitive driving score of 70.1 on the Longest6 benchmark in CARLA with minimal concessions on inference speed. Our findings suggest that ensemble diffusion, used as a drop-in replacement for traditional point-estimate trajectory planning modules, can help improve the safety of driving decisions by modeling the uncertainty of the posterior trajectory distribution.
Flying Co-Stereo: Enabling Long-Range Aerial Dense Mapping via Collaborative Stereo Vision of Dynamic-Baseline
Lightweight long-range mapping is critical for safe navigation of UAV swarms in large-scale unknown environments. Traditional stereo vision systems with fixed short baselines face limited perception ranges. To address this, we propose Flying Co-Stereo, a cross-agent collaborative stereo vision system that leverages the wide-baseline spatial configuration of two UAVs for long-range dense mapping. Key innovations include: (1) a dual-spectrum visual-inertial-ranging estimator for robust baseline estimation; (2) a hybrid feature association strategy combining deep learning-based cross-agent matching and optical-flow-based intra-agent tracking; (3) A sparse-to-dense depth recovery scheme,refining dense monocular depth predictions using exponential fitting of long-range triangulated sparse landmarks for precise metric-scale mapping. Experiments demonstrate the Flying Co-Stereo system achieves dense 3D mapping up to 70 meters with 2.3%-9.7% relative error, outperforming conventional systems by up to 350% in depth range and 450% in coverage area. The project webpage: https://xingxingzuo.github.io/flying_co_stereo
Multi-Objective Neural Network Assisted Design Optimization of Soft Fin-Ray Grippers for Enhanced Grasping Performance
Soft Fin-Ray grippers can perform delicate and careful manipulation, which has caused notable attention in different fields. These grippers can handle objects of various forms and sizes safely. The internal structure of the Fin-Ray finger plays a significant role in its adaptability and grasping performance. However, modeling the non-linear grasp force and deformation behaviors for design purposes is challenging. Moreover, when the Fin-Ray finger becomes more rigid and capable of exerting higher forces, it becomes less delicate in handling objects. The contrast between these two objectives gives rise to a multi-objective optimization problem. In this study, we employ finite element method (FEM) to estimate the deflections and contact forces of the Fin-Ray, grasping cylindrical objects. This dataset is then used to construct a multilayer perception (MLP) for prediction of the contact force and the tip displacement. The FEM dataset consists of three input and four target features. The three input features of the MLP and optimization design variables are the thickness of the front and supporting beams, the thickness of the cross beams, and the equal spacing between the cross beams. In addition, the target features are the maximum contact forces and maximum tip displacements in x- and y-directions. The magnitude of maximum contact force and magnitude of maximum tip displacement are the two objectives, showing the trade-off between force and delicate manipulation in soft Fin-Ray grippers. Furthermore, the optimized set of solutions are found using multi-objective optimal techniques. We use non-dominated sorting genetic algorithm (NSGA-II) method for this purpose. Our findings demonstrate that our methodologies can be used to improve the design and gripping performance of soft robotic grippers, helping us to choose a design not only for delicate grasping but also for high-force applications.
Disturbance-Aware Adaptive Compensation in Hybrid Force-Position Locomotion Policy for Legged Robots
Reinforcement Learning (RL)-based methods have significantly improved the locomotion performance of legged robots. However, these motion policies face significant challenges when deployed in the real world. Robots operating in uncertain environments struggle to adapt to payload variations and external disturbances, resulting in severe degradation of motion performance. In this work, we propose a novel Hybrid Force-Position Locomotion Policy (HFPLP) learning framework, where the action space of the policy is defined as a combination of target joint positions and feedforward torques, enabling the robot to rapidly respond to payload variations and external disturbances. In addition, the proposed Disturbance-Aware Adaptive Compensation (DAAC) provides compensation actions in the torque space based on external disturbance estimation, enhancing the robot's adaptability to dynamic environmental changes. We validate our approach in both simulation and real-world deployment, demonstrating that it outperforms existing methods in carrying payloads and resisting disturbances.
comment: 8 pages, 12 figures
Diffusion Models for Increasing Accuracy in Olfaction Sensors and Datasets
Robotic odour source localization (OSL) is a critical capability for autonomous systems operating in complex environments. However, current OSL methods often suffer from ambiguities, particularly when robots misattribute odours to incorrect objects due to limitations in olfactory datasets and sensor resolutions. To address this challenge, we introduce a novel machine learning method using diffusion-based molecular generation to enhance odour localization accuracy that can be used by itself or with automated olfactory dataset construction pipelines with vision-language models (VLMs) This generative process of our diffusion model expands the chemical space beyond the limitations of both current olfactory datasets and the training data of VLMs, enabling the identification of potential odourant molecules not previously documented. The generated molecules can then be more accurately validated using advanced olfactory sensors which emulate human olfactory recognition through electronic sensor arrays. By integrating visual analysis, language processing, and molecular generation, our framework enhances the ability of olfaction-vision models on robots to accurately associate odours with their correct sources, thereby improving navigation and decision-making in environments where olfactory cues are essential. Our methodology represents a foundational advancement in the field of robotic olfaction, offering a scalable solution to the challenges posed by limited olfactory data and sensor ambiguities.
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
Real-world embodied agents face long-horizon tasks, characterized by high-level goals demanding multi-step solutions beyond single actions. Successfully navigating these requires both high-level task planning (i.e., decomposing goals into sub-tasks) and low-level motion control (i.e., generating precise robot actions). While existing vision language action (VLA) models and hierarchical architectures offer potential in embodied tasks, the former often falter in planning, and the latter can suffer from coordination issues, both hampering performance. We introduce a new unified VLA framework for long-horizon tasks, dubbed LoHoVLA, to overcome these limitations. LoHoVLA leverages a large pretrained vision language model (VLM) as the backbone to jointly generate language and action tokens for sub-task generation and robot action prediction, respectively. This shared representation promotes better generalization across tasks. Additionally, LoHoVLA embraces a hierarchical closed-loop control mechanism to mitigate errors originating from both high-level planning and low-level control. To train LoHoVLA, we introduce LoHoSet, a dataset built on the Ravens simulator, containing 20 long-horizon tasks, each with 1,000 expert demonstrations composed of visual observations, linguistic goals, sub-tasks, and robot actions. Experimental results show that LoHoVLA significantly surpasses both hierarchical and standard VLA approaches on long-horizon embodied tasks in the Ravens simulator. These findings underscore the promise of unified architectures for advancing generalizable embodied intelligence.
Position: Olfaction Standardization is Essential for the Advancement of Embodied Artificial Intelligence
Despite extraordinary progress in artificial intelligence (AI), modern systems remain incomplete representations of human cognition. Vision, audition, and language have received disproportionate attention due to well-defined benchmarks, standardized datasets, and consensus-driven scientific foundations. In contrast, olfaction - a high-bandwidth, evolutionarily critical sense - has been largely overlooked. This omission presents a foundational gap in the construction of truly embodied and ethically aligned super-human intelligence. We argue that the exclusion of olfactory perception from AI architectures is not due to irrelevance but to structural challenges: unresolved scientific theories of smell, heterogeneous sensor technologies, lack of standardized olfactory datasets, absence of AI-oriented benchmarks, and difficulty in evaluating sub-perceptual signal processing. These obstacles have hindered the development of machine olfaction despite its tight coupling with memory, emotion, and contextual reasoning in biological systems. In this position paper, we assert that meaningful progress toward general and embodied intelligence requires serious investment in olfactory research by the AI community. We call for cross-disciplinary collaboration - spanning neuroscience, robotics, machine learning, and ethics - to formalize olfactory benchmarks, develop multimodal datasets, and define the sensory capabilities necessary for machines to understand, navigate, and act within human environments. Recognizing olfaction as a core modality is essential not only for scientific completeness, but for building AI systems that are ethically grounded in the full scope of the human experience.
Tunable Virtual IMU Frame by Weighted Averaging of Multiple Non-Collocated IMUs
We present a new method to combine several rigidly connected but physically separated IMUs through a weighted average into a single virtual IMU (VIMU). This has the benefits of (i) reducing process noise through averaging, and (ii) allowing for tuning the location of the VIMU. The VIMU can be placed to be coincident with, for example, a camera frame or GNSS frame, thereby offering a quality-of-life improvement for users. Specifically, our VIMU removes the need to consider any lever-arm terms in the propagation model. We also present a quadratic programming method for selecting the weights to minimize the noise of the VIMU while still selecting the placement of its reference frame. We tested our method in simulation and validated it on a real dataset. The results show that our averaging technique works for IMUs with large separation and performance gain is observed in both the simulation and the real experiment compared to using only a single IMU.
Haptic Rapidly-Exploring Random Trees: A Sampling-based Planner for Quasi-static Manipulation Tasks
In this work, we explore how conventional motion planning algorithms can be reapplied to contact-rich manipulation tasks. Rather than focusing solely on efficiency, we investigate how manipulation aspects can be recast in terms of conventional motion-planning algorithms. Conventional motion planners, such as Rapidly-Exploring Random Trees (RRT), typically compute collision-free paths in configuration space. However, in manipulation tasks, intentional contact is often necessary. For example, when dealing with a crowded bookshelf, a robot must strategically push books aside before inserting a new one. In such scenarios, classical motion planners often fail because of insufficient space. As such, we presents Haptic Rapidly-Exploring Random Trees (HapticRRT), a planning algorithm that incorporates a recently proposed optimality measure in the context of \textit{quasi-static} manipulation, based on the (squared) Hessian of manipulation potential. The key contributions are i) adapting classical RRT to a framework that re-frames quasi-static manipulation as a planning problem on an implicit equilibrium manifold; ii) discovering multiple manipulation strategies, corresponding to branches of the equilibrium manifold. iii) providing deeper insight to haptic obstacle and haptic metric, enhancing interpretability. We validate our approach on a simulated pendulum and a real-world crowded bookshelf task, demonstrating its ability to autonomously discover strategic wedging-in policies and multiple branches. The video can be found at https://youtu.be/D-zpI0RznZ4
Music-driven Robot Swarm Painting
This paper proposes a novel control framework for robotic swarms capable of turning a musical input into a painting. The approach connects the two artistic domains, music and painting, leveraging their respective connections to fundamental emotions. The robotic units of the swarm are controlled in a coordinated fashion using a heterogeneous coverage policy to control the motion of the robots which continuously release traces of color in the environment. The results of extensive simulations performed starting from different musical inputs and with different color equipments are reported. Finally, the proposed framework has been implemented on real robots equipped with LED lights and capable of light-painting.
Falcon: Fast Visuomotor Policies via Partial Denoising
Diffusion policies are widely adopted in complex visuomotor tasks for their ability to capture multimodal action distributions. However, the multiple sampling steps required for action generation significantly harm real-time inference efficiency, which limits their applicability in real-time decision-making scenarios. Existing acceleration techniques either require retraining or degrade performance under low sampling steps. Here we propose Falcon, which mitigates this speed-performance trade-off and achieves further acceleration. The core insight is that visuomotor tasks exhibit sequential dependencies between actions. Falcon leverages this by reusing partially denoised actions from historical information rather than sampling from Gaussian noise at each step. By integrating current observations, Falcon reduces sampling steps while preserving performance. Importantly, Falcon is a training-free algorithm that can be applied as a plug-in to further improve decision efficiency on top of existing acceleration techniques. We validated Falcon in 48 simulated environments and 2 real-world robot experiments. demonstrating a 2-7x speedup with negligible performance degradation, offering a promising direction for efficient visuomotor policy design.
Fully Onboard SLAM for Distributed Mapping with a Swarm of Nano-Drones
The use of Unmanned Aerial Vehicles (UAVs) is rapidly increasing in applications ranging from surveillance and first-aid missions to industrial automation involving cooperation with other machines or humans. To maximize area coverage and reduce mission latency, swarms of collaborating drones have become a significant research direction. However, this approach requires open challenges in positioning, mapping, and communications to be addressed. This work describes a distributed mapping system based on a swarm of nano-UAVs, characterized by a limited payload of 35 g and tightly constrained onboard sensing and computing capabilities. Each nano-UAV is equipped with four 64-pixel depth sensors that measure the relative distance to obstacles in four directions. The proposed system merges the information from the swarm and generates a coherent grid map without relying on any external infrastructure. The data fusion is performed using the iterative closest point algorithm and a graph-based simultaneous localization and mapping algorithm, running entirely onboard the UAV's low-power ARM Cortex-M microcontroller with just 192 kB of memory. Field results gathered in three different mazes with a swarm of up to 4 nano-UAVs prove a mapping accuracy of 12 cm and demonstrate that the mapping time is inversely proportional to the number of agents. The proposed framework scales linearly in terms of communication bandwidth and onboard computational complexity, supporting communication between up to 20 nano-UAVs and mapping of areas up to 180 m2 with the chosen configuration requiring only 50 kB of memory.
comment: 18 pages
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans. How can safety constraints be explicitly integrated into VLAs? We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors, effectively constraining VLA policies via safe reinforcement learning, and rigorously assuring their safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effective safety-performance trade-offs, this exploration yields an 83.58% safety improvement compared to the current state-of-the-art method, while also maintaining task performance (+3.85%). (II) strong safety assurance, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robust generalization of learned safety behaviors to various out-of-distribution perturbations. Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io.
comment: 26 pages, 12 figures
Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs
Autonomous systems that operate in continuous state, action, and observation spaces require planning and reasoning under uncertainty. Existing online planning methods for such POMDPs are almost exclusively sample-based, yet they forego the power of high-dimensional gradient optimization as combining it into Monte Carlo Tree Search (MCTS) has proved difficult, especially in non-parametric settings. We close this gap with three contributions. First, we derive a novel action-gradient theorem for both MDPs and POMDPs in terms of transition likelihoods, making gradient information accessible during tree search. Second, we introduce the Multiple Importance Sampling (MIS) tree, that re-uses samples for changing action branches, yielding consistent value estimates that enable in-search gradient steps. Third, we derive exact transition probability computation via the area formula for smooth generative models common in physical domains, a result of independent interest. These elements combine into Action-Gradient Monte Carlo Tree Search (AGMCTS), the first planner to blend non-parametric particle search with online gradient refinement in POMDPs. Across several challenging continuous MDP and POMDP benchmarks, AGMCTS outperforms widely-used sample-only solvers in solution quality.
Don't Let Your Robot be Harmful: Responsible Robotic Manipulation via Safety-as-Policy
Unthinking execution of human instructions in robotic manipulation can lead to severe safety risks, such as poisonings, fires, and even explosions. In this paper, we present responsible robotic manipulation, which requires robots to consider potential hazards in the real-world environment while completing instructions and performing complex operations safely and efficiently. However, such scenarios in real world are variable and risky for training. To address this challenge, we propose Safety-as-policy, which includes (i) a world model to automatically generate scenarios containing safety risks and conduct virtual interactions, and (ii) a mental model to infer consequences with reflections and gradually develop the cognition of safety, allowing robots to accomplish tasks while avoiding dangers. Additionally, we create the SafeBox synthetic dataset, which includes one hundred responsible robotic manipulation tasks with different safety risk scenarios and instructions, effectively reducing the risks associated with real-world experiments. Experiments demonstrate that Safety-as-policy can avoid risks and efficiently complete tasks in both synthetic dataset and real-world experiments, significantly outperforming baseline methods. Our SafeBox dataset shows consistent evaluation results with real-world scenarios, serving as a safe and effective benchmark for future research.
Active Multi-task Policy Fine-tuning
Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interest and applying imitation learning algorithms, such as behavioral cloning. However, as soon as several tasks need to be learned, we must decide which tasks should be demonstrated and how often? We study this multi-task problem and explore an interactive framework in which the agent adaptively selects the tasks to be demonstrated. We propose AMF (Active Multi-task Fine-tuning), an algorithm to maximize multi-task policy performance under a limited demonstration budget by collecting demonstrations yielding the largest information gain on the expert policy. We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness to efficiently fine-tune neural policies in complex and high-dimensional environments.
LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments
The Iterative Closest Point (ICP) algorithm is a crucial component of LiDAR-based SLAM algorithms. However, its performance can be negatively affected in unstructured environments that lack features and geometric structures, leading to low accuracy and poor robustness in localization and mapping. It is known that degeneracy caused by the lack of geometric constraints can lead to errors in 6-DOF pose estimation along ill-conditioned directions. Therefore, there is a need for a broader and more fine-grained degeneracy detection and handling method. This paper proposes a new point cloud registration framework, LP-ICP, that combines point-to-line and point-to-plane distance metrics in the ICP algorithm, with localizability detection and handling. Rather than relying solely on point-to-plane localizability information, LP-ICP enhances the localizability analysis by incorporating a point-to-line metric, thereby exploiting richer geometric constraints. It consists of a localizability detection module and an optimization module. The localizability detection module performs localizability analysis by utilizing the correspondences between edge points (with low local smoothness) to lines and planar points (with high local smoothness) to planes between the scan and the map. The localizability contribution of individual correspondence constraints can be applied to a broader range. The optimization module adds additional soft and hard constraints to the optimization equations based on the localizability category. This allows the pose to be constrained along ill-conditioned directions. The proposed method is evaluated on simulation and real-world datasets, showing comparable or better accuracy than the state-of-the art methods in tested scenarios. Observed variations in partially localizable directions suggest the need for further investigation on robustness and generalizability.
comment: 18 Pages, 9 Figures
View-Invariant Policy Learning via Zero-Shot Novel View Synthesis
Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. Specifically, we study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints given a single input image. For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments. We empirically analyze view synthesis models within a simple data-augmentation scheme that we call View Synthesis Augmentation (VISTA) to understand their capabilities for learning viewpoint-invariant policies from single-viewpoint demonstration data. Upon evaluating the robustness of policies trained with our method to out-of-distribution camera viewpoints, we find that they outperform baselines in both simulated and real-world manipulation tasks. Videos and additional visualizations are available at https://s-tian.github.io/projects/vista.
comment: Accepted to CoRL 2024
Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.
comment: 9 pages, 8 figures, 5 tables
SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation
Accurate instrument pose estimation is a crucial step towards the future of robotic surgery, enabling applications such as autonomous surgical task execution. Vision-based methods for surgical instrument pose estimation provide a practical approach to tool tracking, but they often require markers to be attached to the instruments. Recently, more research has focused on the development of marker-less methods based on deep learning. However, acquiring realistic surgical data, with ground truth instrument poses, required for deep learning training, is challenging. To address the issues in surgical instrument pose estimation, we introduce the Surgical Robot Instrument Pose Estimation (SurgRIPE) challenge, hosted at the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023. The objectives of this challenge are: (1) to provide the surgical vision community with realistic surgical video data paired with ground truth instrument poses, and (2) to establish a benchmark for evaluating markerless pose estimation methods. The challenge led to the development of several novel algorithms that showcased improved accuracy and robustness over existing methods. The performance evaluation study on the SurgRIPE dataset highlights the potential of these advanced algorithms to be integrated into robotic surgery systems, paving the way for more precise and autonomous surgical procedures. The SurgRIPE challenge has successfully established a new benchmark for the field, encouraging further research and development in surgical robot instrument pose estimation.
comment: 35 pages, 18 figures, journal paper
Multiagent Systems
Adaptive Traffic-Following Scheme for Orderly Distributed Control of Multi-Vehicle Systems
We present an adaptive control scheme to enable the emergence of order within distributed, autonomous multi-agent systems. Past studies showed that under high-density conditions, order generated from traffic-following behavior reduces travel times, while under low densities, choosing direct paths is more beneficial. In this paper, we leveraged those findings to allow aircraft to independently and dynamically adjust their degree of traffic-following behavior based on the current state of the airspace. This enables aircraft to follow other traffic only when beneficial. Quantitative analyses revealed that dynamic traffic-following behavior results in lower aircraft travel times at the cost of minimal levels of additional disorder to the airspace. The sensitivity of these benefits to temporal and spatial horizons was also investigated. Overall, this work highlights the benefits, and potential necessity, of incorporating self-organizing behavior in making distributed, autonomous multi-agent systems scalable.
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Directly training Large Language Models (LLMs) for Multi-Agent Systems (MAS) remains challenging due to intricate reward modeling, dynamic agent interactions, and demanding generalization requirements. This paper explores whether post-training techniques, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), can effectively $\textit{generalize}$ to multi-agent scenarios. We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory, its demand for structured analytical reasoning, and its relevance to real-world applications such as market design, resource allocation, and policy analysis. We introduce $\textbf{Recon}$ ($\textbf{R}$easoning like an $\textbf{ECON}$omist), a 7B-parameter open-source LLM post-trained on a hand-curated dataset of 2,100 high-quality economic reasoning problems. Comprehensive evaluation on economic reasoning benchmarks and multi-agent games reveals clear improvements in structured reasoning and economic rationality. These results underscore the promise of domain-aligned post-training for enhancing reasoning and agent alignment, shedding light on the roles of SFT and RL in shaping model behavior. Code is available at https://github.com/MasterZhou1/Recon .
Two-Sided Manipulation Games in Stable Matching Markets
The Deferred Acceptance (DA) algorithm is an elegant procedure for finding a stable matching in two-sided matching markets. It ensures that no pair of agents prefers each other to their matched partners. In this work, we initiate the study of two-sided manipulations in matching markets as non-cooperative games. We introduce the accomplice manipulation game, where a man misreports to help a specific woman obtain a better partner, whenever possible. We provide a polynomial time algorithm for finding a pure strategy Nash equilibrium (NE) and show that our algorithm always yields a stable matching - although not every Nash equilibrium corresponds to a stable matching. Additionally, we show how our analytical techniques for the accomplice manipulation game can be applied to other manipulation games in matching markets, such as one-for-many and the standard self-manipulation games. We complement our theoretical findings with empirical evaluations of different properties of the resulting NE, such as the welfare of the agents.
Reinforcement Learning for Hanabi
Hanabi has become a popular game for research when it comes to reinforcement learning (RL) as it is one of the few cooperative card games where you have incomplete knowledge of the entire environment, thus presenting a challenge for a RL agent. We explored different tabular and deep reinforcement learning algorithms to see which had the best performance both against an agent of the same type and also against other types of agents. We establish that certain agents played their highest scoring games against specific agents while others exhibited higher scores on average by adapting to the opposing agent's behavior. We attempted to quantify the conditions under which each algorithm provides the best advantage and identified the most interesting interactions between agents of different types. In the end, we found that temporal difference (TD) algorithms had better overall performance and balancing of play types compared to tabular agents. Specifically, tabular Expected SARSA and deep Q-Learning agents showed the best performance.
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Text-to-image (T2I) generation has made remarkable progress, yet existing systems still lack intuitive control over spatial composition, object consistency, and multi-step editing. We present $\textbf{LayerCraft}$, a modular framework that uses large language models (LLMs) as autonomous agents to orchestrate structured, layered image generation and editing. LayerCraft supports two key capabilities: (1) $\textit{structured generation}$ from simple prompts via chain-of-thought (CoT) reasoning, enabling it to decompose scenes, reason about object placement, and guide composition in a controllable, interpretable manner; and (2) $\textit{layered object integration}$, allowing users to insert and customize objects -- such as characters or props -- across diverse images or scenes while preserving identity, context, and style. The system comprises a coordinator agent, the $\textbf{ChainArchitect}$ for CoT-driven layout planning, and the $\textbf{Object Integration Network (OIN)}$ for seamless image editing using off-the-shelf T2I models without retraining. Through applications like batch collage editing and narrative scene generation, LayerCraft empowers non-experts to iteratively design, customize, and refine visual content with minimal manual effort. Code will be released at https://github.com/PeterYYZhang/LayerCraft.
comment: 26 pages
Ensuring Truthfulness in Distributed Aggregative Optimization
Distributed aggregative optimization methods are gaining increased traction due to their ability to address cooperative control and optimization problems, where the objective function of each agent depends not only on its own decision variable but also on the aggregation of other agents' decision variables. Nevertheless, existing distributed aggregative optimization methods implicitly assume all agents to be truthful in information sharing, which can be unrealistic in real-world scenarios, where agents may act selfishly or strategically. In fact, an opportunistic agent may deceptively share false information in its own favor to minimize its own loss, which, however, will compromise the network-level global performance. To solve this issue, we propose a new distributed aggregative optimization algorithm that can ensure truthfulness of agents and convergence performance. To the best of our knowledge, this is the first algorithm that ensures truthfulness in a fully distributed setting, where no "centralized" aggregator exists to collect private information/decision variables from participating agents. We systematically characterize the convergence rate of our algorithm under nonconvex/convex/strongly convex objective functions, which generalizes existing distributed aggregative optimization results that only focus on convex objective functions. We also rigorously quantify the tradeoff between convergence performance and the level of enabled truthfulness under different convexity conditions. Numerical simulations using distributed charging of electric vehicles confirm the efficacy of our algorithm.
comment: 20 pages, 7 figures
Systems and Control (CS)
Bi-Level optimization for parameter estimation of differential equations using interpolation
Inverse problem or parameter estimation of ordinary differential equations is a process of obtaining the best parameters using experimental measurements of the states. Single (Multiple)-shooting is a type of sequential optimization method that minimizes the error in the measured and numerically integrated states. However, this requires computing sensitivities i.e. the derivatives of states with respect to the parameters over the numerical integrator, which can get computationally expensive. To address this challenge, many interpolation-based approaches have been proposed to either reduce the computational cost of sensitivity calculations or eliminate their need. In this paper, we use a bi-level optimization framework that leverages interpolation and exploits the structure of the differential equation to solve an inner convex optimization problem. We apply this method to two different problem formulations. First, parameter estimation for differential equations, and delayed differential equations, where the model structure is known but the parameters are unknown. Second, model discovery problems, where both the model structure and parameters are unknown.
Adaptive Traffic-Following Scheme for Orderly Distributed Control of Multi-Vehicle Systems
We present an adaptive control scheme to enable the emergence of order within distributed, autonomous multi-agent systems. Past studies showed that under high-density conditions, order generated from traffic-following behavior reduces travel times, while under low densities, choosing direct paths is more beneficial. In this paper, we leveraged those findings to allow aircraft to independently and dynamically adjust their degree of traffic-following behavior based on the current state of the airspace. This enables aircraft to follow other traffic only when beneficial. Quantitative analyses revealed that dynamic traffic-following behavior results in lower aircraft travel times at the cost of minimal levels of additional disorder to the airspace. The sensitivity of these benefits to temporal and spatial horizons was also investigated. Overall, this work highlights the benefits, and potential necessity, of incorporating self-organizing behavior in making distributed, autonomous multi-agent systems scalable.
Integrative, Scalable Modeling of Hydrological Systems with MBSE and HFGT
Worsening global challenges in the Anthropocene demand complex, adaptive solutions grounded in a systems-level understanding of coupled social and environmental dynamics. However, existing modeling approaches often fall short due to disciplinary silos, limited scalability, and the absence of shared ontological frameworks. Model-Based Systems Engineering (MBSE), when integrated with Hetero-functional Graph Theory (HFGT), offers a powerful methodology for modeling systems of systems while preserving subsystem heterogeneity and enabling cross-disciplinary integration. This paper presents the first application of the MBSE-HFGT methodology to environmental systems, using a series of worked examples involving flow through lake and land segments. These examples demonstrate how the approach enables consistent, scalable, and integrative modeling of complex environmental processes.
Federated learning framework for collaborative remaining useful life prognostics: an aircraft engine case study
Complex systems such as aircraft engines are continuously monitored by sensors. In predictive aircraft maintenance, the collected sensor measurements are used to estimate the health condition and the Remaining Useful Life (RUL) of such systems. However, a major challenge when developing prognostics is the limited number of run-to-failure data samples. This challenge could be overcome if multiple airlines would share their run-to-failure data samples such that sufficient learning can be achieved. Due to privacy concerns, however, airlines are reluctant to share their data in a centralized setting. In this paper, a collaborative federated learning framework is therefore developed instead. Here, several airlines cooperate to train a collective RUL prognostic machine learning model, without the need to centrally share their data. For this, a decentralized validation procedure is proposed to validate the prognostics model without sharing any data. Moreover, sensor data is often noisy and of low quality. This paper therefore proposes four novel methods to aggregate the parameters of the global prognostic model. These methods enhance the robustness of the FL framework against noisy data. The proposed framework is illustrated for training a collaborative RUL prognostic model for aircraft engines, using the N-CMAPSS dataset. Here, six airlines are considered, that collaborate in the FL framework to train a collective RUL prognostic model for their aircraft's engines. When comparing the proposed FL framework with the case where each airline independently develops their own prognostic model, the results show that FL leads to more accurate RUL prognostics for five out of the six airlines. Moreover, the novel robust aggregation methods render the FL framework robust to noisy data samples.
Sensor Fusion Methods for Gaussian Mixture Models
Consensus is a popular technique for distributed state estimation. This formulation allows networks of connected agents or sensors to exchange information about the distribution of a set of targets with their immediate neighbors without the need of a centralized node or layer. We present decentralized consensus-based fusion techniques for a system whose target prior estimates are a weighted mixture of Gaussian probability density functions (PDFs) for the following cases: 1) in which all agents have the same a priori Gaussian mixture estimate of the target, and 2) in which agents have different a priori Gaussian mixture estimates of the target. For the second case, we present a formulation that fuses each agent's a priori estimate without using local observations such that each agent's posterior estimate is the same across the network.
comment: 8 pages, to be modified and reduced in size for MFI 2025
A General-Purpose Neuromorphic Sensor based on Spiketrum Algorithm: Hardware Details and Real-life Applications
Spiking Neural Networks (SNNs) offer a biologically inspired computational paradigm, enabling energy-efficient data processing through spike-based information transmission. Despite notable advancements in hardware for SNNs, spike encoding has largely remained software-dependent, limiting efficiency. This paper addresses the need for adaptable and resource-efficient spike encoding hardware by presenting an area-optimized hardware implementation of the Spiketrum algorithm, which encodes time-varying analogue signals into spatiotemporal spike patterns. Unlike earlier performance-optimized designs, which prioritize speed, our approach focuses on reducing hardware footprint, achieving a 52% reduction in Block RAMs (BRAMs), 31% fewer Digital Signal Processing (DSP) slices, and a 6% decrease in Look-Up Tables (LUTs). The proposed implementation has been verified on an FPGA and successfully integrated into an IC using TSMC180 technology. Experimental results demonstrate the system's effectiveness in real-world applications, including sound and ECG classification. This work highlights the trade-offs between performance and resource efficiency, offering a flexible, scalable solution for neuromorphic systems in power-sensitive applications like cochlear implants and neural devices.
comment: Currently under review with IEEE TCAS
Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"
Spike-based encoders represent information as sequences of spikes or pulses, which are transmitted between neurons. A prevailing consensus suggests that spike-based approaches demonstrate exceptional capabilities in capturing the temporal dynamics of neural activity and have the potential to provide energy-efficient solutions for low-power applications. The Spiketrum encoder efficiently compresses input data using spike trains or code sets (for non-spiking applications) and is adaptable to both hardware and software implementations, with lossless signal reconstruction capability. The paper proposes and assesses Spiketrum's hardware, evaluating its output under varying spike rates and its classification performance with popular spiking and non-spiking classifiers, and also assessing the quality of information compression and hardware resource utilization. The paper extensively benchmarks both Spiketrum hardware and its software counterpart against state-of-the-art, biologically-plausible encoders. The evaluations encompass benchmarking criteria, including classification accuracy, training speed, and sparsity when using encoder outputs in pattern recognition and classification with both spiking and non-spiking classifiers. Additionally, they consider encoded output entropy and hardware resource utilization and power consumption of the hardware version of the encoders. Results demonstrate Spiketrum's superiority in most benchmarking criteria, making it a promising choice for various applications. It efficiently utilizes hardware resources with low power consumption, achieving high classification accuracy. This work also emphasizes the potential of encoders in spike-based processing to improve the efficiency and performance of neural computing systems.
comment: To be published at "IEEE/ACM Transactions on Audio, Speech, and Language Processing"
The value of hedging against energy storage uncertainties when designing energy parks
Energy storage is needed to match renewable generation to industrial loads in energy parks. However, the future performance of bulk storage technologies is currently highly uncertain. Due to the urgency of decarbonization targets, energy park projects must be designed and begun now. But, as uncertainty in storage performance reduces, a different technology than identified during initial design may turn out cheaper. Enabling flexibility so that design adaptations can be made as better information becomes available would lower the cost of decarbonizing industry. But having this flexibility is itself costly. This raises the question, "Is it worth it?" This study quantifies the benefit of retaining flexibility to adapt energy park designs and optionality over storage technology choice as uncertainty reduces, to determine whether it is economically worthwhile. It applies the Value of Information analysis framework to the sizing of wind, solar, and storage in an illustrative energy park model based on a real-world proposal near Rotterdam, considering uncertainty in storage efficiency, lifetime, and capital cost. Updating asset sizings after storage uncertainty reduced is found to reduce total costs by 18% on average. Having the option to switch storage technology choice as well reduces costs by a further 13%, which is substantially greater than the cost of providing storage optionality. Using two storage technologies in the energy park reduces costs by 14%, and in this case storage optionality is not worthwhile. These results are robust to the level of uncertainty reduction in storage performance, and the risk aversion of the system designer.
comment: 35 pages, 10 figures, 10 tables
Simulation Results of Center-Manifold-Based Identification of Polynomial Nonlinear Systems with Uncontrollable Linearization
Recently, a system identification method based on center manifold is proposed to identify polynomial nonlinear systems with uncontrollable linearization. This note presents a numerical example to show the effectiveness of this method.
Systems and Control (EESS)
Bi-Level optimization for parameter estimation of differential equations using interpolation
Inverse problem or parameter estimation of ordinary differential equations is a process of obtaining the best parameters using experimental measurements of the states. Single (Multiple)-shooting is a type of sequential optimization method that minimizes the error in the measured and numerically integrated states. However, this requires computing sensitivities i.e. the derivatives of states with respect to the parameters over the numerical integrator, which can get computationally expensive. To address this challenge, many interpolation-based approaches have been proposed to either reduce the computational cost of sensitivity calculations or eliminate their need. In this paper, we use a bi-level optimization framework that leverages interpolation and exploits the structure of the differential equation to solve an inner convex optimization problem. We apply this method to two different problem formulations. First, parameter estimation for differential equations, and delayed differential equations, where the model structure is known but the parameters are unknown. Second, model discovery problems, where both the model structure and parameters are unknown.
Adaptive Traffic-Following Scheme for Orderly Distributed Control of Multi-Vehicle Systems
We present an adaptive control scheme to enable the emergence of order within distributed, autonomous multi-agent systems. Past studies showed that under high-density conditions, order generated from traffic-following behavior reduces travel times, while under low densities, choosing direct paths is more beneficial. In this paper, we leveraged those findings to allow aircraft to independently and dynamically adjust their degree of traffic-following behavior based on the current state of the airspace. This enables aircraft to follow other traffic only when beneficial. Quantitative analyses revealed that dynamic traffic-following behavior results in lower aircraft travel times at the cost of minimal levels of additional disorder to the airspace. The sensitivity of these benefits to temporal and spatial horizons was also investigated. Overall, this work highlights the benefits, and potential necessity, of incorporating self-organizing behavior in making distributed, autonomous multi-agent systems scalable.
Integrative, Scalable Modeling of Hydrological Systems with MBSE and HFGT
Worsening global challenges in the Anthropocene demand complex, adaptive solutions grounded in a systems-level understanding of coupled social and environmental dynamics. However, existing modeling approaches often fall short due to disciplinary silos, limited scalability, and the absence of shared ontological frameworks. Model-Based Systems Engineering (MBSE), when integrated with Hetero-functional Graph Theory (HFGT), offers a powerful methodology for modeling systems of systems while preserving subsystem heterogeneity and enabling cross-disciplinary integration. This paper presents the first application of the MBSE-HFGT methodology to environmental systems, using a series of worked examples involving flow through lake and land segments. These examples demonstrate how the approach enables consistent, scalable, and integrative modeling of complex environmental processes.
Federated learning framework for collaborative remaining useful life prognostics: an aircraft engine case study
Complex systems such as aircraft engines are continuously monitored by sensors. In predictive aircraft maintenance, the collected sensor measurements are used to estimate the health condition and the Remaining Useful Life (RUL) of such systems. However, a major challenge when developing prognostics is the limited number of run-to-failure data samples. This challenge could be overcome if multiple airlines would share their run-to-failure data samples such that sufficient learning can be achieved. Due to privacy concerns, however, airlines are reluctant to share their data in a centralized setting. In this paper, a collaborative federated learning framework is therefore developed instead. Here, several airlines cooperate to train a collective RUL prognostic machine learning model, without the need to centrally share their data. For this, a decentralized validation procedure is proposed to validate the prognostics model without sharing any data. Moreover, sensor data is often noisy and of low quality. This paper therefore proposes four novel methods to aggregate the parameters of the global prognostic model. These methods enhance the robustness of the FL framework against noisy data. The proposed framework is illustrated for training a collaborative RUL prognostic model for aircraft engines, using the N-CMAPSS dataset. Here, six airlines are considered, that collaborate in the FL framework to train a collective RUL prognostic model for their aircraft's engines. When comparing the proposed FL framework with the case where each airline independently develops their own prognostic model, the results show that FL leads to more accurate RUL prognostics for five out of the six airlines. Moreover, the novel robust aggregation methods render the FL framework robust to noisy data samples.
Sensor Fusion Methods for Gaussian Mixture Models
Consensus is a popular technique for distributed state estimation. This formulation allows networks of connected agents or sensors to exchange information about the distribution of a set of targets with their immediate neighbors without the need of a centralized node or layer. We present decentralized consensus-based fusion techniques for a system whose target prior estimates are a weighted mixture of Gaussian probability density functions (PDFs) for the following cases: 1) in which all agents have the same a priori Gaussian mixture estimate of the target, and 2) in which agents have different a priori Gaussian mixture estimates of the target. For the second case, we present a formulation that fuses each agent's a priori estimate without using local observations such that each agent's posterior estimate is the same across the network.
comment: 8 pages, to be modified and reduced in size for MFI 2025
A General-Purpose Neuromorphic Sensor based on Spiketrum Algorithm: Hardware Details and Real-life Applications
Spiking Neural Networks (SNNs) offer a biologically inspired computational paradigm, enabling energy-efficient data processing through spike-based information transmission. Despite notable advancements in hardware for SNNs, spike encoding has largely remained software-dependent, limiting efficiency. This paper addresses the need for adaptable and resource-efficient spike encoding hardware by presenting an area-optimized hardware implementation of the Spiketrum algorithm, which encodes time-varying analogue signals into spatiotemporal spike patterns. Unlike earlier performance-optimized designs, which prioritize speed, our approach focuses on reducing hardware footprint, achieving a 52% reduction in Block RAMs (BRAMs), 31% fewer Digital Signal Processing (DSP) slices, and a 6% decrease in Look-Up Tables (LUTs). The proposed implementation has been verified on an FPGA and successfully integrated into an IC using TSMC180 technology. Experimental results demonstrate the system's effectiveness in real-world applications, including sound and ECG classification. This work highlights the trade-offs between performance and resource efficiency, offering a flexible, scalable solution for neuromorphic systems in power-sensitive applications like cochlear implants and neural devices.
comment: Currently under review with IEEE TCAS
Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"
Spike-based encoders represent information as sequences of spikes or pulses, which are transmitted between neurons. A prevailing consensus suggests that spike-based approaches demonstrate exceptional capabilities in capturing the temporal dynamics of neural activity and have the potential to provide energy-efficient solutions for low-power applications. The Spiketrum encoder efficiently compresses input data using spike trains or code sets (for non-spiking applications) and is adaptable to both hardware and software implementations, with lossless signal reconstruction capability. The paper proposes and assesses Spiketrum's hardware, evaluating its output under varying spike rates and its classification performance with popular spiking and non-spiking classifiers, and also assessing the quality of information compression and hardware resource utilization. The paper extensively benchmarks both Spiketrum hardware and its software counterpart against state-of-the-art, biologically-plausible encoders. The evaluations encompass benchmarking criteria, including classification accuracy, training speed, and sparsity when using encoder outputs in pattern recognition and classification with both spiking and non-spiking classifiers. Additionally, they consider encoded output entropy and hardware resource utilization and power consumption of the hardware version of the encoders. Results demonstrate Spiketrum's superiority in most benchmarking criteria, making it a promising choice for various applications. It efficiently utilizes hardware resources with low power consumption, achieving high classification accuracy. This work also emphasizes the potential of encoders in spike-based processing to improve the efficiency and performance of neural computing systems.
comment: To be published at "IEEE/ACM Transactions on Audio, Speech, and Language Processing"
The value of hedging against energy storage uncertainties when designing energy parks
Energy storage is needed to match renewable generation to industrial loads in energy parks. However, the future performance of bulk storage technologies is currently highly uncertain. Due to the urgency of decarbonization targets, energy park projects must be designed and begun now. But, as uncertainty in storage performance reduces, a different technology than identified during initial design may turn out cheaper. Enabling flexibility so that design adaptations can be made as better information becomes available would lower the cost of decarbonizing industry. But having this flexibility is itself costly. This raises the question, "Is it worth it?" This study quantifies the benefit of retaining flexibility to adapt energy park designs and optionality over storage technology choice as uncertainty reduces, to determine whether it is economically worthwhile. It applies the Value of Information analysis framework to the sizing of wind, solar, and storage in an illustrative energy park model based on a real-world proposal near Rotterdam, considering uncertainty in storage efficiency, lifetime, and capital cost. Updating asset sizings after storage uncertainty reduced is found to reduce total costs by 18% on average. Having the option to switch storage technology choice as well reduces costs by a further 13%, which is substantially greater than the cost of providing storage optionality. Using two storage technologies in the energy park reduces costs by 14%, and in this case storage optionality is not worthwhile. These results are robust to the level of uncertainty reduction in storage performance, and the risk aversion of the system designer.
comment: 35 pages, 10 figures, 10 tables
Simulation Results of Center-Manifold-Based Identification of Polynomial Nonlinear Systems with Uncontrollable Linearization
Recently, a system identification method based on center manifold is proposed to identify polynomial nonlinear systems with uncontrollable linearization. This note presents a numerical example to show the effectiveness of this method.
Robotics
PB&J: Peanut Butter and Joints for Damped Articulation
Many bioinspired robots mimic the rigid articulated joint structure of the human hand for grasping tasks, but experience high-frequency mechanical perturbations that can destabilize the system and negatively affect precision without a high-frequency controller. Despite having bandwidth-limited controllers that experience time delays between sensing and actuation, biological systems can respond successfully to and mitigate these high-frequency perturbations. Human joints include damping and stiffness that many rigid articulated bioinspired hand robots lack. To enable researchers to explore the effects of joint viscoelasticity in joint control, we developed a human-hand-inspired grasping robot with viscoelastic structures that utilizes accessible and bioderived materials to reduce the economic and environmental impact of prototyping novel robotic systems. We demonstrate that an elastic element at the finger joints is necessary to achieve concurrent flexion, which enables secure grasping of spherical objects. To significantly damp the manufactured finger joints, we modeled, manufactured, and characterized rotary dampers using peanut butter as an organic analog joint working fluid. Finally, we demonstrated that a real-time position-based controller could be used to successfully catch a lightweight falling ball. We developed this open-source, low-cost grasping platform that abstracts the morphological and mechanical properties of the human hand to enable researchers to explore questions about biomechanics in roboto that would otherwise be difficult to test in simulation or modeling.
comment: to be published in Living Machines 2025 Proceedings
DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation
We study the problem of functional retargeting: learning dexterous manipulation policies to track object states from human hand-object demonstrations. We focus on long-horizon, bimanual tasks with articulated objects, which is challenging due to large action space, spatiotemporal discontinuities, and embodiment gap between human and robot hands. We propose DexMachina, a novel curriculum-based algorithm: the key idea is to use virtual object controllers with decaying strength: an object is first driven automatically towards its target states, such that the policy can gradually learn to take over under motion and contact guidance. We release a simulation benchmark with a diverse set of tasks and dexterous hands, and show that DexMachina significantly outperforms baseline methods. Our algorithm and benchmark enable a functional comparison for hardware designs, and we present key findings informed by quantitative and qualitative results. With the recent surge in dexterous hand development, we hope this work will provide a useful platform for identifying desirable hardware capabilities and lower the barrier for contributing to future research. Videos and more at https://project-dexmachina.github.io/
Bi-Manual Joint Camera Calibration and Scene Representation
Robot manipulation, especially bimanual manipulation, often requires setting up multiple cameras on multiple robot manipulators. Before robot manipulators can generate motion or even build representations of their environments, the cameras rigidly mounted to the robot need to be calibrated. Camera calibration is a cumbersome process involving collecting a set of images, with each capturing a pre-determined marker. In this work, we introduce the Bi-Manual Joint Calibration and Representation Framework (Bi-JCR). Bi-JCR enables multiple robot manipulators, each with cameras mounted, to circumvent taking images of calibration markers. By leveraging 3D foundation models for dense, marker-free multi-view correspondence, Bi-JCR jointly estimates: (i) the extrinsic transformation from each camera to its end-effector, (ii) the inter-arm relative poses between manipulators, and (iii) a unified, scale-consistent 3D representation of the shared workspace, all from the same captured RGB image sets. The representation, jointly constructed from images captured by cameras on both manipulators, lives in a common coordinate frame and supports collision checking and semantic segmentation to facilitate downstream bimanual coordination tasks. We empirically evaluate the robustness of Bi-JCR on a variety of tabletop environments, and demonstrate its applicability on a variety of downstream tasks.
RealDrive: Retrieval-Augmented Driving with Diffusion Models
Learning-based planners generate natural human-like driving behaviors by learning to reason about nuanced interactions from data, overcoming the rigid behaviors that arise from rule-based planners. Nonetheless, data-driven approaches often struggle with rare, safety-critical scenarios and offer limited controllability over the generated trajectories. To address these challenges, we propose RealDrive, a Retrieval-Augmented Generation (RAG) framework that initializes a diffusion-based planning policy by retrieving the most relevant expert demonstrations from the training dataset. By interpolating between current observations and retrieved examples through a denoising process, our approach enables fine-grained control and safe behavior across diverse scenarios, leveraging the strong prior provided by the retrieved scenario. Another key insight we produce is that a task-relevant retrieval model trained with planning-based objectives results in superior planning performance in our framework compared to a task-agnostic retriever. Experimental results demonstrate improved generalization to long-tail events and enhanced trajectory diversity compared to standard learning-based planners -- we observe a 40% reduction in collision rate on the Waymo Open Motion dataset with RAG.
DiG-Net: Enhancing Quality of Life through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics
Dynamic hand gestures play a pivotal role in assistive human-robot interaction (HRI), facilitating intuitive, non-verbal communication, particularly for individuals with mobility constraints or those operating robots remotely. Current gesture recognition methods are mostly limited to short-range interactions, reducing their utility in scenarios demanding robust assistive communication from afar. In this paper, we introduce a novel approach designed specifically for assistive robotics, enabling dynamic gesture recognition at extended distances of up to 30 meters, thereby significantly improving accessibility and quality of life. Our proposed Distance-aware Gesture Network (DiG-Net) effectively combines Depth-Conditioned Deformable Alignment (DADA) blocks with Spatio-Temporal Graph modules, enabling robust processing and classification of gesture sequences captured under challenging conditions, including significant physical attenuation, reduced resolution, and dynamic gesture variations commonly experienced in real-world assistive environments. We further introduce the Radiometric Spatio-Temporal Depth Attenuation Loss (RSTDAL), shown to enhance learning and strengthen model robustness across varying distances. Our model demonstrates significant performance improvement over state-of-the-art gesture recognition frameworks, achieving a recognition accuracy of 97.3% on a diverse dataset with challenging hyper-range gestures. By effectively interpreting gestures from considerable distances, DiG-Net significantly enhances the usability of assistive robots in home healthcare, industrial safety, and remote assistance scenarios, enabling seamless and intuitive interactions for users regardless of physical limitations
comment: arXiv admin note: substantial text overlap with arXiv:2411.18413
EL-AGHF: Extended Lagrangian Affine Geometric Heat Flow
We propose a constrained Affine Geometric Heat Flow (AGHF) method that evolves so as to suppress the dynamics gaps associated with inadmissible control directions. AGHF provides a unified framework applicable to a wide range of motion planning problems, including both holonomic and non-holonomic systems. However, to generate admissible trajectories, it requires assigning infinite penalties to inadmissible control directions. This design choice, while theoretically valid, often leads to high computational cost or numerical instability when the penalty becomes excessively large. To overcome this limitation, we extend AGHF in an Augmented Lagrangian method approach by introducing a dual trajectory related to dynamics gaps in inadmissible control directions. This method solves the constrained variational problem as an extended parabolic partial differential equation defined over both the state and dual trajectorys, ensuring the admissibility of the resulting trajectory. We demonstrate the effectiveness of our algorithm through simulation examples.
comment: 6 pages, 4 figures
Black-box Adversarial Attacks on CNN-based SLAM Algorithms
Continuous advancements in deep learning have led to significant progress in feature detection, resulting in enhanced accuracy in tasks like Simultaneous Localization and Mapping (SLAM). Nevertheless, the vulnerability of deep neural networks to adversarial attacks remains a challenge for their reliable deployment in applications, such as navigation of autonomous agents. Even though CNN-based SLAM algorithms are a growing area of research there is a notable absence of a comprehensive presentation and examination of adversarial attacks targeting CNN-based feature detectors, as part of a SLAM system. Our work introduces black-box adversarial perturbations applied to the RGB images fed into the GCN-SLAM algorithm. Our findings on the TUM dataset [30] reveal that even attacks of moderate scale can lead to tracking failure in as many as 76% of the frames. Moreover, our experiments highlight the catastrophic impact of attacking depth instead of RGB input images on the SLAM system.
comment: 9 pages, 8 figures
System-integrated intrinsic static-dynamic pressure sensing enabled by charge excitation and 3D gradient engineering for autonomous robotic interaction
High-resolution pressure sensing that distinguishes static and dynamic inputs is vital for intelligent robotics but remains challenging for self-powered sensors. We present a self-powered intrinsic static-dynamic pressure sensor (iSD Sensor) that integrates charge excitation with a 3D gradient-engineered structure, achieving enhanced voltage outputs-over 25X for static and 15X for dynamic modes. The sensor exhibits multi-region sensitivities (up to 34.7 V/kPa static, 48.4 V/kPa dynamic), a low detection limit of 6.13 Pa, and rapid response/recovery times (83/43 ms). This design enables nuanced tactile perception and supports dual-mode robotic control: proportional actuation via static signals and fast triggering via dynamic inputs. Integrated into a wireless closed-loop system, the iSD Sensor enables precise functions such as finger bending, object grasping, and sign language output.
How can AI reduce wrist injuries in the workplace?
This paper explores the development of a control and sensor strategy for an industrial wearable wrist exoskeleton by classifying and predicting workers' actions. The study evaluates the correlation between exerted force and effort intensity, along with sensor strategy optimization, for designing purposes. Using data from six healthy subjects in a manufacturing plant, this paper presents EMG-based models for wrist motion classification and force prediction. Wrist motion recognition is achieved through a pattern recognition algorithm developed with surface EMG data from an 8-channel EMG sensor (Myo Armband); while a force regression model uses wrist and hand force measurements from a commercial handheld dynamometer (Vernier GoDirect Hand Dynamometer). This control strategy forms the foundation for a streamlined exoskeleton architecture designed for industrial applications, focusing on simplicity, reduced costs, and minimal sensor use while ensuring reliable and effective assistance.
comment: Pages 569-580
Reactive Aerobatic Flight via Reinforcement Learning
Quadrotors have demonstrated remarkable versatility, yet their full aerobatic potential remains largely untapped due to inherent underactuation and the complexity of aggressive maneuvers. Traditional approaches, separating trajectory optimization and tracking control, suffer from tracking inaccuracies, computational latency, and sensitivity to initial conditions, limiting their effectiveness in dynamic, high-agility scenarios. Inspired by recent breakthroughs in data-driven methods, we propose a reinforcement learning-based framework that directly maps drone states and aerobatic intentions to control commands, eliminating modular separation to enable quadrotors to perform end-to-end policy optimization for extreme aerobatic maneuvers. To ensure efficient and stable training, we introduce an automated curriculum learning strategy that dynamically adjusts aerobatic task difficulty. Enabled by domain randomization for robust zero-shot sim-to-real transfer, our approach is validated in demanding real-world experiments, including the first demonstration of a drone autonomously performing continuous inverted flight while reactively navigating a moving gate, showcasing unprecedented agility.
comment: This work has been submitted to RAL and is under review
SAH-Drive: A Scenario-Aware Hybrid Planner for Closed-Loop Vehicle Trajectory Generation
Reliable planning is crucial for achieving autonomous driving. Rule-based planners are efficient but lack generalization, while learning-based planners excel in generalization yet have limitations in real-time performance and interpretability. In long-tail scenarios, these challenges make planning particularly difficult. To leverage the strengths of both rule-based and learning-based planners, we proposed the Scenario-Aware Hybrid Planner (SAH-Drive) for closed-loop vehicle trajectory planning. Inspired by human driving behavior, SAH-Drive combines a lightweight rule-based planner and a comprehensive learning-based planner, utilizing a dual-timescale decision neuron to determine the final trajectory. To enhance the computational efficiency and robustness of the hybrid planner, we also employed a diffusion proposal number regulator and a trajectory fusion module. The experimental results show that the proposed method significantly improves the generalization capability of the planning system, achieving state-of-the-art performance in interPlan, while maintaining computational efficiency without incurring substantial additional runtime.
comment: 17 pages, 8 figures, International Conference on Machine Learning
MagicGripper: A Multimodal Sensor-Integrated Gripper for Contact-Rich Robotic Manipulation
Contact-rich manipulation in unstructured environments demands precise, multimodal perception to enable robust and adaptive control. Vision-based tactile sensors (VBTSs) have emerged as an effective solution; however, conventional VBTSs often face challenges in achieving compact, multi-modal functionality due to hardware constraints and algorithmic complexity. In this work, we present MagicGripper, a multimodal sensor-integrated gripper designed for contact-rich robotic manipulation. Building on our prior design, MagicTac, we develop a compact variant, mini-MagicTac, which features a three-dimensional, multi-layered grid embedded in a soft elastomer. MagicGripper integrates mini-MagicTac, enabling high-resolution tactile feedback alongside proximity and visual sensing within a compact, gripper-compatible form factor. We conduct a thorough evaluation of mini-MagicTac's performance, demonstrating its capabilities in spatial resolution, contact localization, and force regression. We also assess its robustness across manufacturing variability, mechanical deformation, and sensing performance under real-world conditions. Furthermore, we validate the effectiveness of MagicGripper through three representative robotic tasks: a teleoperated assembly task, a contact-based alignment task, and an autonomous robotic grasping task. Across these experiments, MagicGripper exhibits reliable multimodal perception, accurate force estimation, and high adaptability to challenging manipulation scenarios. Our results highlight the potential of MagicGripper as a practical and versatile tool for embodied intelligence in complex, contact-rich environments.
comment: 19 pages, 24 figures
Imitation Learning-Based Path Generation for the Complex Assembly of Deformable Objects
This paper investigates how learning can be used to ease the design of high-quality paths for the assembly of deformable objects. Object dynamics plays an important role when manipulating deformable objects; thus, detailed models are often used when conducting motion planning for deformable objects. We propose to use human demonstrations and learning to enable motion planning of deformable objects with only simple dynamical models of the objects. In particular, we use the offline collision-free path planning, to generate a large number of reference paths based on a simple model of the deformable object. Subsequently, we execute the collision-free paths on a robot with a compliant control such that a human can slightly modify the path to complete the task successfully. Finally, based on the virtual path data sets and the human corrected ones, we use behavior cloning (BC) to create a dexterous policy that follows one reference path to finish a given task.
DTR: Delaunay Triangulation-based Racing for Scaled Autonomous Racing
Reactive controllers for autonomous racing avoid the computational overhead of full ee-Think-Act autonomy stacks by directly mapping sensor input to control actions, eliminating the need for localization and planning. A widely used reactive strategy is FTG, which identifies gaps in LiDAR range measurements and steers toward a chosen one. While effective on fully bounded circuits, FTG fails in scenarios with incomplete boundaries and is prone to driving into dead-ends, known as FTG-traps. This work presents DTR, a reactive controller that combines Delaunay triangulation, from raw LiDAR readings, with track boundary segmentation to extract a centerline while systematically avoiding FTG-traps. Compared to FTG, the proposed method achieves lap times that are 70\% faster and approaches the performance of map-dependent methods. With a latency of 8.95 ms and CPU usage of only 38.85\% on the robot's OBC, DTR is real-time capable and has been successfully deployed and evaluated in field experiments.
SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping
Recent advancements in 3D robotic manipulation have improved grasping of everyday objects, but transparent and specular materials remain challenging due to depth sensing limitations. While several 3D reconstruction and depth completion approaches address these challenges, they suffer from setup complexity or limited observation information utilization. To address this, leveraging the power of single view 3D object reconstruction approaches, we propose a training free framework SR3D that enables robotic grasping of transparent and specular objects from a single view observation. Specifically, given single view RGB and depth images, SR3D first uses the external visual models to generate 3D reconstructed object mesh based on RGB image. Then, the key idea is to determine the 3D object's pose and scale to accurately localize the reconstructed object back into its original depth corrupted 3D scene. Therefore, we propose view matching and keypoint matching mechanisms,which leverage both the 2D and 3D's inherent semantic and geometric information in the observation to determine the object's 3D state within the scene, thereby reconstructing an accurate 3D depth map for effective grasp detection. Experiments in both simulation and real world show the reconstruction effectiveness of SR3D.
SignBot: Learning Human-to-Humanoid Sign Language Interaction
Sign language is a natural and visual form of language that uses movements and expressions to convey meaning, serving as a crucial means of communication for individuals who are deaf or hard-of-hearing (DHH). However, the number of people proficient in sign language remains limited, highlighting the need for technological advancements to bridge communication gaps and foster interactions with minorities. Based on recent advancements in embodied humanoid robots, we propose SignBot, a novel framework for human-robot sign language interaction. SignBot integrates a cerebellum-inspired motion control component and a cerebral-oriented module for comprehension and interaction. Specifically, SignBot consists of: 1) Motion Retargeting, which converts human sign language datasets into robot-compatible kinematics; 2) Motion Control, which leverages a learning-based paradigm to develop a robust humanoid control policy for tracking sign language gestures; and 3) Generative Interaction, which incorporates translator, responser, and generator of sign language, thereby enabling natural and effective communication between robots and humans. Simulation and real-world experimental results demonstrate that SignBot can effectively facilitate human-robot interaction and perform sign language motions with diverse robots and datasets. SignBot represents a significant advancement in automatic sign language interaction on embodied humanoid robot platforms, providing a promising solution to improve communication accessibility for the DHH community.
Safety-Aware Robust Model Predictive Control for Robotic Arms in Dynamic Environments
Robotic manipulators are essential for precise industrial pick-and-place operations, yet planning collision-free trajectories in dynamic environments remains challenging due to uncertainties such as sensor noise and time-varying delays. Conventional control methods often fail under these conditions, motivating the development of Robust MPC (RMPC) strategies with constraint tightening. In this paper, we propose a novel RMPC framework that integrates phase-based nominal control with a robust safety mode, allowing smooth transitions between safe and nominal operations. Our approach dynamically adjusts constraints based on real-time predictions of moving obstacles\textemdash whether human, robot, or other dynamic objects\textemdash thus ensuring continuous, collision-free operation. Simulation studies demonstrate that our controller improves both motion naturalness and safety, achieving faster task completion than conventional methods.
comment: This paper has been accepted to the CASE 2025 conference
Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control
Can your humanoid walk up and hand you a full cup of beer, without spilling a drop? While humanoids are increasingly featured in flashy demos like dancing, delivering packages, traversing rough terrain, fine-grained control during locomotion remains a significant challenge. In particular, stabilizing a filled end-effector (EE) while walking is far from solved, due to a fundamental mismatch in task dynamics: locomotion demands slow-timescale, robust control, whereas EE stabilization requires rapid, high-precision corrections. To address this, we propose SoFTA, a Slow-Fast TwoAgent framework that decouples upper-body and lower-body control into separate agents operating at different frequencies and with distinct rewards. This temporal and objective separation mitigates policy interference and enables coordinated whole-body behavior. SoFTA executes upper-body actions at 100 Hz for precise EE control and lower-body actions at 50 Hz for robust gait. It reduces EE acceleration by 2-5x relative to baselines and performs much closer to human-level stability, enabling delicate tasks such as carrying nearly full cups, capturing steady video during locomotion, and disturbance rejection with EE stability.
Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction
Learning a generalizable bimanual manipulation policy is extremely challenging for embodied agents due to the large action space and the need for coordinated arm movements. Existing approaches rely on Vision-Language-Action (VLA) models to acquire bimanual policies. However, transferring knowledge from single-arm datasets or pre-trained VLA models often fails to generalize effectively, primarily due to the scarcity of bimanual data and the fundamental differences between single-arm and bimanual manipulation. In this paper, we propose a novel bimanual foundation policy by fine-tuning the leading text-to-video models to predict robot trajectories and training a lightweight diffusion policy for action generation. Given the lack of embodied knowledge in text-to-video models, we introduce a two-stage paradigm that fine-tunes independent text-to-flow and flow-to-video models derived from a pre-trained text-to-video model. Specifically, optical flow serves as an intermediate variable, providing a concise representation of subtle movements between images. The text-to-flow model predicts optical flow to concretize the intent of language instructions, and the flow-to-video model leverages this flow for fine-grained video prediction. Our method mitigates the ambiguity of language in single-stage text-to-video prediction and significantly reduces the robot-data requirement by avoiding direct use of low-level actions. In experiments, we collect high-quality manipulation data for real dual-arm robot, and the results of simulation and real-world experiments demonstrate the effectiveness of our method.
Humanoid Loco-Manipulations Pattern Generation and Stabilization Control
In order for a humanoid robot to perform loco-manipulation such as moving an object while walking, it is necessary to account for sustained or alternating external forces other than ground-feet reaction, resulting from humanoid-object contact interactions. In this letter, we propose a bipedal control strategy for humanoid loco-manipulation that can cope with such external forces. First, the basic formulas of the bipedal dynamics, i.e., linear inverted pendulum mode and divergent component of motion, are derived, taking into account the effects of external manipulation forces. Then, we propose a pattern generator to plan center of mass trajectories consistent with the reference trajectory of the manipulation forces, and a stabilizer to compensate for the error between desired and actual manipulation forces. The effectiveness of our controller is assessed both in simulation and loco-manipulation experiments with real humanoid robots.
Towards Tangible Immersion for Cobot Programming-by-Demonstration: Visual, Tactile and Haptic Interfaces for Mixed-Reality Cobot Automation in Semiconductor Manufacturing
Sensor-based reactive and hybrid approaches have proven a promising line of study to address imperfect knowledge in grasping and manipulation. However the reactive approaches are usually tightly coupled to a particular embodiment making transfer of knowledge difficult. This paper proposes a paradigm for modeling and execution of reactive manipulation actions, which makes knowledge transfer to different embodiments possible while retaining the reactive capabilities of the embodiments. The proposed approach extends the idea of control primitives coordinated by a state machine by introducing an embodiment independent layer of abstraction. Abstract manipulation primitives constitute a vocabulary of atomic, embodiment independent actions, which can be coordinated using state machines to describe complex actions. To obtain embodiment specific models, the abstract state machines are automatically translated to embodiment specific models, such that full capabilities of each platform can be utilized. The strength of the manipulation primitives paradigm is demonstrated by developing a set of corresponding embodiment specific primitives for object transport, including a complex reactive grasping primitive. The robustness of the approach is experimentally studied in emptying of a box filled with several unknown objects. The embodiment independence is studied by performing a manipulation task on two different platforms using the same abstract description.
comment: 4 Pages, 5 Figures
MSC-LIO: An MSCKF-Based LiDAR-Inertial Odometry with Same-Plane Cluster Tracking
The multi-state constraint Kalman filter (MSCKF) has been proven to be more efficient than graph optimization for visual-based odometry while with similar accuracy. However, it has not been adequately considered and studied for LiDAR-based odometry. In this paper, we propose a novel tightly-coupled LiDAR-inertial odometry based on the MSCKF framework, named MSC-LIO. An efficient LiDAR same-plane cluster (LSPC) tracking method, without explicit feature extraction, is present for frame-to-frame data associations. The tracked LSPC is used to build an LSPC measurement model that constructs multi-state constraints. Besides, we propose an effective point-velocity-based LiDAR-IMU time-delay (LITD) estimation method, which is derived from the proposed LSPC tracking method. To validate the effectiveness and robustness of the proposed method, we conducted extensive experiments on both public datasets and real-world environments. The results demonstrate that the proposed MSC-LIO yields higher accuracy and efficiency compared to the state-of-the-art methods. Ablation experiments indicate that the data-association efficiency is improved by nearly 3 times with the LSPC tracking, and the proposed LITD estimation method can effectively and accurately estimate the LITD. Besides, MSC-LIO was implemented on an edge device and demonstrated excellent real-time performance.
comment: 11 pages, 12 figures, 8 tables
Collision Probability Estimation for Optimization-based Vehicular Motion Planning
Many motion planning algorithms for automated driving require estimating the probability of collision (POC) to account for uncertainties in the measurement and estimation of the motion of road users. Common POC estimation techniques often utilize sampling-based methods that suffer from computational inefficiency and a non-deterministic estimation, i.e., each estimation result for the same inputs is slightly different. In contrast, optimization-based motion planning algorithms require computationally efficient POC estimation, ideally using deterministic estimation, such that typical optimization algorithms for motion planning retain feasibility. Estimating the POC analytically, however, is challenging because it depends on understanding the collision conditions (e.g., vehicle's shape) and characterizing the uncertainty in motion prediction. In this paper, we propose an approach in which we estimate the POC between two vehicles by over-approximating their shapes by a multi-circular shape approximation. The position and heading of the predicted vehicle are modelled as random variables, contrasting with the literature, where the heading angle is often neglected. We guarantee that the provided POC is an over-approximation, which is essential in providing safety guarantees, and present a computationally efficient algorithm for computing the POC estimate for Gaussian uncertainty in the position and heading. This algorithm is then used in a path-following stochastic model predictive controller (SMPC) for motion planning. With the proposed algorithm, the SMPC generates reproducible trajectories while the controller retains its feasibility in the presented test cases and demonstrates the ability to handle varying levels of uncertainty.
comment: 14 pages, 6 figures
Image-Based Roadmaps for Vision-Only Planning and Control of Robotic Manipulators
This work presents a motion planning framework for robotic manipulators that computes collision-free paths directly in image space. The generated paths can then be tracked using vision-based control, eliminating the need for an explicit robot model or proprioceptive sensing. At the core of our approach is the construction of a roadmap entirely in image space. To achieve this, we explicitly define sampling, nearest-neighbor selection, and collision checking based on visual features rather than geometric models. We first collect a set of image-space samples by moving the robot within its workspace, capturing keypoints along its body at different configurations. These samples serve as nodes in the roadmap, which we construct using either learned or predefined distance metrics. At runtime, the roadmap generates collision-free paths directly in image space, removing the need for a robot model or joint encoders. We validate our approach through an experimental study in which a robotic arm follows planned paths using an adaptive vision-based control scheme to avoid obstacles. The results show that paths generated with the learned-distance roadmap achieved 100% success in control convergence, whereas the predefined image-space distance roadmap enabled faster transient responses but had a lower success rate in convergence.
A study on the effects of mixed explicit and implicit communications in human-virtual-agent interactions
Communication between humans and robots (or virtual agents) is essential for interaction and often inspired by human communication, which uses gestures, facial expressions, gaze direction, and other explicit and implicit means. This work presents an interaction experiment where humans and virtual agents interact through explicit (gestures, manual entries using mouse and keyboard, voice, sound, and information on screen) and implicit (gaze direction, location, facial expressions, and raise of eyebrows) communication to evaluate the effect of mixed explicit-implicit communication against purely explicit communication. Results obtained using Bayesian parameter estimation show that the number of errors and task execution time did not significantly change when mixed explicit and implicit communications were used, and neither the perceived efficiency of the interaction. In contrast, acceptance, sociability, and transparency of the virtual agent increased when using mixed communication modalities (88.3%, 92%, and 92.9% of the effect size posterior distribution of each variable, respectively, were above the upper limit of the region of practical equivalence). This suggests that task-related measures, such as time, number of errors, and perceived efficiency of the interaction, have not been influenced by the communication type in our particular experiment. However, the improvement of subjective measures related to the virtual agent, such as acceptance, sociability, and transparency, suggests that humans are more receptive to mixed explicit and implicit communications.
comment: Main paper with 23 pages, 12 figures, 4 tables. Supplementary material with 19 pages, 16 figures, 2 tables. Submitted to Intelligent Service Robotics
Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization
Despite over a decade of development, autonomous driving trajectory planning in complex urban environments continues to encounter significant challenges. These challenges include the difficulty in accommodating the multi-modal nature of trajectories, the limitations of single expert model in managing diverse scenarios, and insufficient consideration of environmental interactions. To address these issues, this paper introduces the EMoE-Planner, which incorporates three innovative approaches. Firstly, the Explicit MoE (Mixture of Experts) dynamically selects specialized experts based on scenario-specific information through a shared scene router. Secondly, the planner utilizes scene-specific queries to provide multi-modal priors, directing the model's focus towards relevant target areas. Lastly, it enhances the prediction model and loss calculation by considering the interactions between the ego vehicle and other agents, thereby significantly boosting planning performance. Comparative experiments were conducted on the Nuplan dataset against the state-of-the-art methods. The simulation results demonstrate that our model consistently outperforms SOTA models across nearly all test scenarios. Our model is the first pure learning model to achieve performance surpassing rule-based algorithms in almost all Nuplan closed-loop simulations.
comment: Main text 10 pages with 7 figures
Blimp-based Crime Scene Analysis
Crime is a critical problem -- which often takes place behind closed doors, posing additional difficulties for investigators. To bring hidden truths to light, evidence at indoor crime scenes must be documented before any contamination or degradation occurs. Here, we address this challenge from the perspective of artificial intelligence (AI), computer vision, and robotics: Specifically, we explore the use of a blimp as a "floating camera" to drift over and record evidence with minimal disturbance. Adopting a rapid prototyping approach, we develop a proof-of-concept to investigate capabilities required for manual or semi-autonomous operation. Consequently, our results demonstrate the feasibility of equipping indoor blimps with various components (such as RGB and thermal cameras, LiDARs, and WiFi, with 20 minutes of battery life). Moreover, we confirm the core premise: that such blimps can be used to observe crime scene evidence while generating little airflow. We conclude by proposing some ideas related to detection (e.g., of bloodstains), mapping, and path planning, with the aim of stimulating further discussion and exploration.
comment: 16 pages, 5 figures, 1 table; Accepted for SAIS 2025
Ontological Component-based Description of Robot Capabilities
A key aspect of a robot's knowledge base is self-awareness about what it is capable of doing. It allows to define which tasks it can be assigned to and which it cannot. We will refer to this knowledge as the Capability concept. As capabilities stems from the components the robot owns, they can be linked together. In this work, we hypothesize that this concept can be inferred from the components rather than merely linked to them. Therefore, we introduce an ontological means of inferring the agent's capabilities based on the components it owns as well as low-level capabilities. This inference allows the agent to acknowledge what it is able to do in a responsive way and it is generalizable to external entities the agent can carry for example. To initiate an action, the robot needs to link its capabilities with external entities. To do so, it needs to infer affordance relations from its capabilities as well as the external entity's dispositions. This work is part of a broader effort to integrate social affordances into a Human-Robot collaboration context and is an extension of an already existing ontology.
comment: International Workshop on Working towards Ontology-based Standards for Robotics and Automation (WOSRA 2023 - 2nd Edition), Jun 2023, Londres, United Kingdom
Interactive OT Gym: A Reinforcement Learning-Based Interactive Optical tweezer (OT)-Driven Microrobotics Simulation Platform ICRA 2025
Optical tweezers (OT) offer unparalleled capabilities for micromanipulation with submicron precision in biomedical applications. However, controlling conventional multi-trap OT to achieve cooperative manipulation of multiple complex-shaped microrobots in dynamic environments poses a significant challenge. To address this, we introduce Interactive OT Gym, a reinforcement learning (RL)-based simulation platform designed for OT-driven microrobotics. Our platform supports complex physical field simulations and integrates haptic feedback interfaces, RL modules, and context-aware shared control strategies tailored for OT-driven microrobot in cooperative biological object manipulation tasks. This integration allows for an adaptive blend of manual and autonomous control, enabling seamless transitions between human input and autonomous operation. We evaluated the effectiveness of our platform using a cell manipulation task. Experimental results show that our shared control system significantly improves micromanipulation performance, reducing task completion time by approximately 67% compared to using pure human or RL control alone and achieving a 100% success rate. With its high fidelity, interactivity, low cost, and high-speed simulation capabilities, Interactive OT Gym serves as a user-friendly training and testing environment for the development of advanced interactive OT-driven micromanipulation systems and control algorithms. For more details on the project, please see our website https://sites.google.com/view/otgym
comment: ICRA 2025
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation Learning (SQIL), which combines quantization-aware training with a selective loss-weighting strategy for mission-critical states. By identifying these states via saliency scores and emphasizing them in the training loss, SQIL preserves decision fidelity under low-bit precision. We validate SQIL's generalization capability across extensive simulation benchmarks with environment variations, real-world tasks, and cross-domain tasks (self-driving, physics simulation), consistently recovering full-precision performance. Notably, a 4-bit weight-quantized VLA model for robotic manipulation achieves up to 2.5x speedup and 2.5x energy savings on an edge GPU with minimal accuracy loss. These results underline SQIL's potential for efficiently deploying large IL-based policy models on resource-limited devices.
comment: arXiv admin note: text overlap with arXiv:2412.01034
Learning-Based Leader Localization for Underwater Vehicles With Optical-Acoustic-Pressure Sensor Fusion
Underwater vehicles have emerged as a critical technology for exploring and monitoring aquatic environments. The deployment of multi-vehicle systems has gained substantial interest due to their capability to perform collaborative tasks with improved efficiency. However, achieving precise localization of a leader underwater vehicle within a multi-vehicle configuration remains a significant challenge, particularly in dynamic and complex underwater conditions. To address this issue, this paper presents a novel tri-modal sensor fusion neural network approach that integrates optical, acoustic, and pressure sensors to localize the leader vehicle. The proposed method leverages the unique strengths of each sensor modality to improve localization accuracy and robustness. Specifically, optical sensors provide high-resolution imaging for precise relative positioning, acoustic sensors enable long-range detection and ranging, and pressure sensors offer environmental context awareness. The fusion of these sensor modalities is implemented using a deep learning architecture designed to extract and combine complementary features from raw sensor data. The effectiveness of the proposed method is validated through a custom-designed testing platform. Extensive data collection and experimental evaluations demonstrate that the tri-modal approach significantly improves the accuracy and robustness of leader localization, outperforming both single-modal and dual-modal methods.
A Comprehensive Survey on Physical Risk Control in the Era of Foundation Model-enabled Robotics IJCAI 2025
Recent Foundation Model-enabled robotics (FMRs) display greatly improved general-purpose skills, enabling more adaptable automation than conventional robotics. Their ability to handle diverse tasks thus creates new opportunities to replace human labor. However, unlike general foundation models, FMRs interact with the physical world, where their actions directly affect the safety of humans and surrounding objects, requiring careful deployment and control. Based on this proposition, our survey comprehensively summarizes robot control approaches to mitigate physical risks by covering all the lifespan of FMRs ranging from pre-deployment to post-accident stage. Specifically, we broadly divide the timeline into the following three phases: (1) pre-deployment phase, (2) pre-incident phase, and (3) post-incident phase. Throughout this survey, we find that there is much room to study (i) pre-incident risk mitigation strategies, (ii) research that assumes physical interaction with humans, and (iii) essential issues of foundation models themselves. We hope that this survey will be a milestone in providing a high-resolution analysis of the physical risks of FMRs and their control, contributing to the realization of a good human-robot relationship.
comment: Accepted to IJCAI 2025 Survey Track
Accelerating the Evolution of Personalized Automated Lane Change through Lesson Learning
Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: learning from driver's takeover interventions. By leveraging online takeover data, the driving zone is generated to ensure perceived safety using Gaussian discriminant analysis. Real-time corrections to trajectory planning rewards are enacted through apprenticeship learning. Guided by the objective of optimizing rewards within the constraints of the driving zone, this approach employs model predictive control for trajectory planning. This lesson learning framework is highlighted for its faster evolution capability, adeptness at experience accumulating, assurance of perceived safety, and computational efficiency. Simulation results demonstrate that the proposed system consistently achieves a successful customization without further takeover interventions. Accumulated experience yields a 24% enhancement in evolution efficiency. The average number of learning iterations is only 13.8. The average computation time is 0.08 seconds.
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI
Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation. ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more. Simulation with rendering on ManiSkill3 can run 10-1000x faster with 2-3x less GPU memory usage than other platforms, achieving up to 30,000+ FPS in benchmarked environments due to minimal python/pytorch overhead in the system, simulation on the GPU, and the use of the SAPIEN parallel rendering system. Tasks that used to take hours to train can now take minutes. We further provide the most comprehensive range of GPU parallelized environments/tasks spanning 12 distinct domains including but not limited to mobile manipulation for tasks such as drawing, humanoids, and dextrous manipulation in realistic scenes designed by artists or real-world digital twins. In addition, millions of demonstration frames are provided from motion planning, RL, and teleoperation. ManiSkill3 also provides a comprehensive set of baselines that span popular RL and learning-from-demonstrations algorithms.
comment: Project website: http://maniskill.ai/
Bayesian Inferential Motion Planning Using Heavy-Tailed Distributions
Robots rely on motion planning to navigate safely and efficiently while performing various tasks. In this paper, we investigate motion planning through Bayesian inference, where motion plans are inferred based on planning objectives and constraints. However, existing Bayesian motion planning methods often struggle to explore low-probability regions of the planning space, where high-quality plans may reside. To address this limitation, we propose the use of heavy-tailed distributions -- specifically, Student's-$t$ distributions -- to enhance probabilistic inferential search for motion plans. We develop a novel sequential single-pass smoothing approach that integrates Student's-$t$ distribution with Monte Carlo sampling. A special case of this approach is ensemble Kalman smoothing, which depends on short-tailed Gaussian distributions. We validate the proposed approach through simulations in autonomous vehicle motion planning, demonstrating its superior performance in planning, sampling efficiency, and constraint satisfaction compared to ensemble Kalman smoothing. While focused on motion planning, this work points to the broader potential of heavy-tailed distributions in enhancing probabilistic decision-making in robotics.
Behavioral Safety Assessment towards Large-scale Deployment of Autonomous Vehicles
Autonomous vehicles (AVs) have significantly advanced in real-world deployment in recent years, yet safety continues to be a critical barrier to widespread adoption. Traditional functional safety approaches, which primarily verify the reliability, robustness, and adequacy of AV hardware and software systems from a vehicle-centric perspective, do not sufficiently address the AV's broader interactions and behavioral impact on the surrounding traffic environment. To overcome this limitation, we propose a paradigm shift toward behavioral safety, a comprehensive approach focused on evaluating AV responses and interactions within traffic environment. To systematically assess behavioral safety, we introduce a third-party AV safety assessment framework comprising two complementary evaluation components: Driver Licensing Test and Driving Intelligence Test. The Driver Licensing Test evaluates AV's reactive behaviors under controlled scenarios, ensuring basic behavioral competency. In contrast, the Driving Intelligence Test assesses AV's interactive behaviors within naturalistic traffic conditions, quantifying the frequency of safety-critical events to deliver statistically meaningful safety metrics before large-scale deployment. We validated our proposed framework using \texttt{Autoware.Universe}, an open-source Level 4 AV, tested both in simulated environments and on the physical test track at the University of Michigan's Mcity Testing Facility. The results indicate that \texttt{Autoware.Universe} passed 6 out of 14 scenarios and exhibited a crash rate of 3.01e-3 crashes per mile, approximately 1,000 times higher than average human driver crash rate. During the tests, we also uncovered several unknown unsafe scenarios for \texttt{Autoware.Universe}. These findings underscore the necessity of behavioral safety evaluations for improving AV safety performance prior to widespread public deployment.
comment: Code and Supplementary Materials available at: https://github.com/michigan-traffic-lab/Behavioral-Safety-Assessment
Fast Convergence of Softmax Policy Mirror Ascent AISTATS 2025
Natural policy gradient (NPG) is a common policy optimization algorithm and can be viewed as mirror ascent in the space of probabilities. Recently, Vaswani et al. [2021] introduced a policy gradient method that corresponds to mirror ascent in the dual space of logits. We refine this algorithm, removing its need for a normalization across actions and analyze the resulting method (referred to as SPMA). For tabular MDPs, we prove that SPMA with a constant step-size matches the linear convergence of NPG and achieves a faster convergence than constant step-size (accelerated) softmax policy gradient. To handle large state-action spaces, we extend SPMA to use a log-linear policy parameterization. Unlike that for NPG, generalizing SPMA to the linear function approximation (FA) setting does not require compatible function approximation. Unlike MDPO, a practical generalization of NPG, SPMA with linear FA only requires solving convex softmax classification problems. We prove that SPMA achieves linear convergence to the neighbourhood of the optimal value function. We extend SPMA to handle non-linear FA and evaluate its empirical performance on the MuJoCo and Atari benchmarks. Our results demonstrate that SPMA consistently achieves similar or better performance compared to MDPO, PPO and TRPO.
comment: AISTATS 2025
Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures ICML 2025
Learning unknown dynamics under environmental (or external) constraints is fundamental to many fields (e.g., modern robotics), particularly challenging when constraint information is only locally available and uncertain. Existing approaches requiring global constraints or using probabilistic filtering fail to fully exploit the geometric structure inherent in local measurements (by using, e.g., sensors) and constraints. This paper presents a geometric framework unifying measurements, constraints, and dynamics learning through a fiber bundle structure over the state space. This naturally induced geometric structure enables measurement-aware Control Barrier Functions that adapt to local sensing (or measurement) conditions. By integrating Neural ODEs, our framework learns continuous-time dynamics while preserving geometric constraints, with theoretical guarantees of learning convergence and constraint satisfaction dependent on sensing quality. The geometric framework not only enables efficient dynamics learning but also suggests promising directions for integration with reinforcement learning approaches. Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions.
comment: Accepted by ICML 2025
Co-Design of Soft Gripper with Neural Physics
For robot manipulation, both the controller and end-effector design are crucial. Soft grippers are generalizable by deforming to different geometries, but designing such a gripper and finding its grasp pose remains challenging. In this paper, we propose a co-design framework that generates an optimized soft gripper's block-wise stiffness distribution and its grasping pose, using a neural physics model trained in simulation. We derived a uniform-pressure tendon model for a flexure-based soft finger, then generated a diverse dataset by randomizing both gripper pose and design parameters. A neural network is trained to approximate this forward simulation, yielding a fast, differentiable surrogate. We embed that surrogate in an end-to-end optimization loop to optimize the ideal stiffness configuration and best grasp pose. Finally, we 3D-print the optimized grippers of various stiffness by changing the structural parameters. We demonstrate that our co-designed grippers significantly outperform baseline designs in both simulation and hardware experiments.
Extended Set-based Tasks for Multi-task Execution and Prioritization
The ability of executing multiple tasks simultaneously is an important feature of redundant robotic systems. As a matter of fact, complex behaviors can often be obtained as a result of the execution of several tasks. Moreover, in safety-critical applications, tasks designed to ensure the safety of the robot and its surroundings have to be executed along with other nominal tasks. In such cases, it is also important to prioritize the former over the latter. In this paper, we formalize the definition of extended set-based tasks, i.e., tasks which can be executed by rendering subsets of the task space asymptotically stable or forward invariant using control barrier functions. We propose a formal mathematical representation of such tasks that allows for the execution of more complex and time-varying prioritized stacks of tasks using kinematic and dynamic robot models alike. We present an optimization-based framework which is computationally efficient, accounts for input bounds, and allows for the stable execution of time-varying prioritized stacks of extended set-based tasks. The proposed framework is validated using extensive simulations, quantitative comparisons to the state-of-the-art hierarchical quadratic programming, and experiments with robotic manipulators.
Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory
To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the the AD model's fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs' powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs' vast and diverse training data. On the other hand, to mitigate vehicles' computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48\% and converges faster over 6 times, compared to the existing state-of-the-art approach.
comment: 7 pages
From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance.
comment: This paper was originally submitted as a class project and included the name of a faculty member without prior permission. At the instructor's request, I am withdrawing the paper. The work may be resubmitted in the future after further development and testing
Multiagent Systems
Distributed Intelligence in the Computing Continuum with Active Inference
The Computing Continuum (CC) is an emerging Internet-based computing paradigm that spans from local Internet of Things sensors and constrained edge devices to large-scale cloud data centers. Its goal is to orchestrate a vast array of diverse and distributed computing resources to support the next generation of Internet-based applications. However, the distributed, heterogeneous, and dynamic nature of CC platforms demands distributed intelligence for adaptive and resilient service management. This article introduces a distributed stream processing pipeline as a CC use case, where each service is managed by an Active Inference (AIF) agent. These agents collaborate to fulfill service needs specified by SLOiDs, a term we introduce to denote Service Level Objectives that are aware of its deployed devices, meaning that non-functional requirements must consider the characteristics of the hosting device. We demonstrate how AIF agents can be modeled and deployed alongside distributed services to manage them autonomously. Our experiments show that AIF agents achieve over 90% SLOiD fulfillment when using tested transition models, and around 80% when learning the models during deployment. We compare their performance to a multi-agent reinforcement learning algorithm, finding that while both approaches yield similar results, MARL requires extensive training, whereas AIF agents can operate effectively from the start. Additionally, we evaluate the behavior of AIF agents in offloading scenarios, observing a strong capacity for adaptation. Finally, we outline key research directions to advance AIF integration in CC platforms.
R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning ICML 2025
Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%.
comment: 21 pages, To appear in the International Conference of Machine Learning (ICML 2025)
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
While multi-agent LLM systems show strong capabilities in various domains, they are highly vulnerable to adversarial and low-performing agents. To resolve this issue, in this paper, we introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring. We model the collaborative query-answering process as an iterative game, where the agents communicate and contribute to a final system output. Our system associates a credibility score that is used when aggregating the team outputs. The credibility scores are learned gradually based on the past contributions of each agent in query answering. Our experiments across multiple tasks and settings demonstrate our system's effectiveness in mitigating adversarial influence and enhancing the resilience of multi-agent cooperation, even in the adversary-majority settings.
Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning
This paper studies the networked multi-agent reinforcement learning (NMARL) problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate Q-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate Q-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate Q-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment.
Sorrel: A simple and flexible framework for multi-agent reinforcement learning
We introduce Sorrel (https://github.com/social-ai-uoft/sorrel), a simple Python interface for generating and testing new multi-agent reinforcement learning environments. This interface places a high degree of emphasis on simplicity and accessibility, and uses a more psychologically intuitive structure for the basic agent-environment loop, making it a useful tool for social scientists to investigate how learning and social interaction leads to the development and change of group dynamics. In this short paper, we outline the basic design philosophy and features of Sorrel.
Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization
Recently, many approaches, such as Chain-of-Thought (CoT) prompting and Multi-Agent Debate (MAD), have been proposed to further enrich Large Language Models' (LLMs) complex problem-solving capacities in reasoning scenarios. However, these methods may fail to solve complex problems due to the lack of ability to find optimal solutions. Swarm Intelligence has been serving as a powerful tool for finding optima in the field of traditional optimization problems. To this end, we propose integrating swarm intelligence into the reasoning process by introducing a novel Agent-based Swarm Intelligence (ASI) paradigm. In this paradigm, we formulate LLM reasoning as an optimization problem and use a swarm intelligence scheme to guide a group of LLM-based agents in collaboratively searching for optimal solutions. To avoid swarm intelligence getting trapped in local optima, we further develop a Swarm Intelligence Enhancing Reasoning (SIER) framework, which develops a density-driven strategy to enhance the reasoning ability. To be specific, we propose to perform kernel density estimation and non-dominated sorting to optimize both solution quality and diversity simultaneously. In this case, SIER efficiently enhances solution space exploration through expanding the diversity of the reasoning path. Besides, a step-level quality evaluation is used to help agents improve solution quality by correcting low-quality intermediate steps. Then, we use quality thresholds to dynamically control the termination of exploration and the selection of candidate steps, enabling a more flexible and efficient reasoning process. Extensive experiments are ...
ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks
Multi-agent systems (MAS) have emerged as a promising approach for enhancing the reasoning capabilities of large language models in complex problem-solving; however, current MAS frameworks suffer from poor flexibility and scalability with underdeveloped optimization strategies. To address these challenges, we propose ReSo, which integrates task graph generation with a reward-driven two-stage agent selection process centered on our Collaborative Reward Model that provides fine-grained reward signals to optimize MAS cooperation. We also introduce an automated data synthesis framework for generating MAS benchmarks without any human annotations. Experimental results show that ReSo matches or outperforms existing methods, achieving 33.7 percent accuracy on Math-MAS and 32.3 percent accuracy on SciBench-MAS, where other approaches completely fail.
On Incentivizing Social Information Sharing Through Routing Games
Crowdsourcing services, such as Waze, leverage a mass of mobile users to learn massive point-of-interest (PoI) information while traveling and share it as a public good. Given that crowdsourced users mind their travel costs and possess various preferences over the PoI information along different paths, we formulate the problem as a novel non-atomic multi-path routing game with positive network externalities among users in social information sharing. In the absence of any incentive design, our price of anarchy (PoA) analysis shows that users' selfish routing on the path with the lowest cost will limit information diversity and lead to $PoA = 0$ with an arbitrarily large efficiency loss from the social optimum. This motivates us to design effective incentive mechanisms to remedy while upholding desirable properties such as individual rationality, incentive compatibility, and budget balance for practical users. Without requiring a specific user's path preference, we present a non-monetary mechanism called Adaptive Information Restriction (AIR) that reduces non-cooperative users' access to the public good as an indirect penalty, which meets all the desirable properties. By meticulously adapting penalty fractions to the actual user flows along different paths, our AIR achieves non-trivial $PoA = \frac{1}{4}$ with low complexity $O(k\log k+\log m)$, where $k$ and $m$ denote the numbers of involved paths and user types, respectively. If the system can further enable pricing for users, we then propose a new monetary mechanism called Adaptive Side-Payment (ASP), which adaptively charges and rewards users according to their chosen paths, respectively. Our ASP mechanism successively achieves a $PoA = \frac{1}{2}$ with even reduced complexity $O(k\log k)$. Finally, our theoretical findings are well corroborated by our experimental results using a real-world public dataset.
comment: This manuscript is a revised version of our earlier submission
Systems and Control (CS)
Convex Approximations of Random Constrained Markov Decision Processes
Constrained Markov decision processes (CMDPs) are used as a decision-making framework to study the long-run performance of a stochastic system. It is well-known that a stationary optimal policy of a CMDP problem under discounted cost criterion can be obtained by solving a linear programming problem when running costs and transition probabilities are exactly known. In this paper, we consider a discounted cost CMDP problem where the running costs and transition probabilities are defined using random variables. Consequently, both the objective function and constraints become random. We use chance constraints to model these uncertainties and formulate the uncertain CMDP problem as a joint chance-constrained Markov decision process (JCCMDP). Under random running costs, we assume that the dependency among random constraint vectors is driven by a Gumbel-Hougaard copula. Using standard probability inequalities, we construct convex upper bound approximations of the JCCMDP problem under certain conditions on random running costs. In addition, we propose a linear programming problem whose optimal value gives a lower bound to the optimal value of the JCCMDP problem. When both running costs and transition probabilities are random, we define the latter variables as a sum of their means and random perturbations. Under mild conditions on the random perturbations and random running costs, we construct convex upper and lower bound approximations of the JCCMDP problem. We analyse the quality of the derived bounds through numerical experiments on a queueing control problem for random running costs. For the case when both running costs and transition probabilities are random, we choose randomly generated Markov decision problems called Garnets for numerical experiments.
Input-Power-to-State Stability of Time-Varying Systems
When the state of a system may remain bounded even if both the input amplitude and energy are unbounded, then the state bounds given by the standard input-to-state stability (ISS) and integral-ISS (iISS) properties may provide no useful information. This paper considers an ISS-related concept suitable in such a case: input-power-to-state stability (IPSS). Necessary and sufficient conditions for IPSS are developed for time-varying systems under very mild assumptions on the dynamics. More precisely, it is shown that (a) the existence of a dissipation-form ISS-Lyapunov function implies IPSS, but not necessarily that of an implication-form one, (b) iISS with exponential class-$\KL$ function implies IPSS, and (c) ISS and stronger assumptions on the dynamics imply the existence of a dissipation-form ISS-Lyapunov function and hence IPSS. The latter result is based on a converse Lyapunov theorem for time-varying systems whose dynamics (i.e. state derivative) is not necessarily continuous with respect to time.
comment: Submitted to Automatica
Robust Distribution Network Reconfiguration Using Mapping-based Column-and-Constraint Generation
The integration of intermittent renewable energy sources into distribution networks introduces significant uncertainties and fluctuations, challenging their operational security, stability, and efficiency. This paper considers robust distribution network reconfiguration (RDNR) with renewable generator resizing, modeled as a two-stage robust optimization (RO) problem with decision-dependent uncertainty (DDU). Our model optimizes resizing decisions as the upper bounds of renewable generator outputs, while also optimizing the network topology. We design a mapping-based column-and-constraint generation (C&CG) algorithm to address the computational challenges raised by DDU. Sensitivity analyses further explore the impact of uncertainty set parameters on optimal solutions. Case studies demonstrate the effectiveness of the proposed algorithm in reducing computational complexity while ensuring solution optimality.
Distributed Intelligence in the Computing Continuum with Active Inference
The Computing Continuum (CC) is an emerging Internet-based computing paradigm that spans from local Internet of Things sensors and constrained edge devices to large-scale cloud data centers. Its goal is to orchestrate a vast array of diverse and distributed computing resources to support the next generation of Internet-based applications. However, the distributed, heterogeneous, and dynamic nature of CC platforms demands distributed intelligence for adaptive and resilient service management. This article introduces a distributed stream processing pipeline as a CC use case, where each service is managed by an Active Inference (AIF) agent. These agents collaborate to fulfill service needs specified by SLOiDs, a term we introduce to denote Service Level Objectives that are aware of its deployed devices, meaning that non-functional requirements must consider the characteristics of the hosting device. We demonstrate how AIF agents can be modeled and deployed alongside distributed services to manage them autonomously. Our experiments show that AIF agents achieve over 90% SLOiD fulfillment when using tested transition models, and around 80% when learning the models during deployment. We compare their performance to a multi-agent reinforcement learning algorithm, finding that while both approaches yield similar results, MARL requires extensive training, whereas AIF agents can operate effectively from the start. Additionally, we evaluate the behavior of AIF agents in offloading scenarios, observing a strong capacity for adaptation. Finally, we outline key research directions to advance AIF integration in CC platforms.
2D PZT MEMS Resonant Scanner Using a Three-Mask Process
This work presents the design, simulation, fabrication, and characterization of a novel architectural compact two-dimensional (2D) resonant MEMS scanning mirror actuated by thin-film lead zirconate titanate (PZT). The device employs an innovative mechanically coupled dual-axis architecture fabricated using a three-mask process on an SOI-PZT deposited wafer, significantly reducing system complexity while achieving high performance. The scanner integrates a 1 $\times$ 1.4 mm oval mirror within a 7 $\times$ 4.7 mm die, actuated by PZT thin-film elements optimized for resonant operation at 3.6 kHz (vertical) and 54.2 kHz (horizontal) under 12 V$_{\mathrm{p-p}}$ periodic pulse driving. The system achieves optical scan angles of 4.8$^\circ$ and 11.5$^\circ$ in vertical and horizontal directions, respectively, with quality factors of 750 (vertical) and 1050 (horizontal). These values contribute to high scanning bandwidth-efficiency products of 24.2 deg$\cdot$mm$\cdot$kHz (vertical) and 623 deg$\cdot$mm$\cdot$kHz (horizontal), among the higher values reported for 2D PZT-MEMS scanners. Finite element analysis confirmed minimal stress and mirror deformation, and experimental validation demonstrated excellent agreement with simulation results. This architecture demonstrates the feasibility of high-resolution laser scanning, as required in applications such as OCT, LiDAR, and displays, by achieving performance levels in line with those used in such systems.
comment: The first two listed authors contributed equally. Parviz Zolfaghari and Hakan Urey are the corresponding authors
Incremental Gain Computation and Regulation of Discrete-time Positive Luré Systems using Linear Programming
This work approaches the problem of computing incremental $\ell_1$ and $\ell_\infty$ gains for discrete-time positive systems in \lure feedback with static memoryless nonlinearities, and regulating the $\ell_\infty$ gain through the design of a state-feedback controller. Finite incremental gains provide a quantitative measure of robustness for trajectories, and will ensure that all pairs of trajectories will converge to a fixed point or will diverge together in the absence of an applied input. Upper-bounds on these incremental gains can be computed through linear programming. Computation and regulation of the $\ell_1$ and $\ell_\infty$ incremental gains are verified by numerical examples.
comment: 13 pages, 4 images
Damping LFOs: Grid Following with Power Oscillation Damping vs. Grid Forming vs. PSS
Low-frequency oscillations (LFOs) present a significant challenge to the stability and reliability of power systems, especially in grids with a high penetration of renewable energy sources. Traditional grid-following (GFL) inverters have proven less effective in damping such oscillations. This paper presents a GFL-power plant controller with an auxiliary power oscillation damping control for damping LFOs. This approach is compared with a traditional power system stabilizer (PSS) for a two-area power system. Next, the research is extended by deploying grid forming (GFM) controls, which by actively controlling the voltage and frequency dynamics emulate the behavior of traditional synchronous generators. The paper analyzes two GFM control strategies: virtual synchronous machine (VSM) and droop control, and demonstrates their effectiveness in damping LFOs in the test system. The simulation results reveal that the performance of the proposed GFM-VSM rivals that of the PSS and is better than the GFL-power oscillation damper.
A Causation-Based Framework for Pricing and Cost Allocation of Energy, Reserves, and Transmission in Modern Power Systems
The increasing vulnerability of power systems has heightened the need for operating reserves to manage contingencies such as generator outages, line failures, and sudden load variations. Unlike energy costs, driven by consumer demand, operating reserve costs arise from addressing the most critical credible contingencies - prompting the question: how should these costs be allocated through efficient pricing mechanisms? As an alternative to previously reported schemes, this paper presents a new causation-based pricing framework for electricity markets based on contingency-constrained energy and reserve scheduling models. Major salient features include a novel security charge mechanism along with the explicit definition of prices for up-spinning reserves, down-spinning reserves, and transmission services. These features ensure more comprehensive and efficient cost-reflective market operations. Moreover, the proposed nodal pricing scheme yields revenue adequacy and neutrality while promoting reliability incentives for generators based on the cost-causation principle. An additional salient aspect of the proposed framework is the economic incentive for transmission assets, which are remunerated based on their use to deliver energy and reserves across all contingency states. Numerical results from two case studies illustrate the performance of the proposed pricing scheme.
Humanoid Loco-Manipulations Pattern Generation and Stabilization Control
In order for a humanoid robot to perform loco-manipulation such as moving an object while walking, it is necessary to account for sustained or alternating external forces other than ground-feet reaction, resulting from humanoid-object contact interactions. In this letter, we propose a bipedal control strategy for humanoid loco-manipulation that can cope with such external forces. First, the basic formulas of the bipedal dynamics, i.e., linear inverted pendulum mode and divergent component of motion, are derived, taking into account the effects of external manipulation forces. Then, we propose a pattern generator to plan center of mass trajectories consistent with the reference trajectory of the manipulation forces, and a stabilizer to compensate for the error between desired and actual manipulation forces. The effectiveness of our controller is assessed both in simulation and loco-manipulation experiments with real humanoid robots.
Deception in Oligopoly Games via Adaptive Nash Seeking Systems
In the theory of multi-agent systems, deception refers to the strategic manipulation of information to influence the behavior of other agents, ultimately altering the long-term dynamics of the entire system. Recently, this concept has been examined in the context of model-free Nash equilibrium seeking (NES) algorithms for noncooperative games. Specifically, it was demonstrated that players can exploit knowledge of other players' exploration signals to drive the system toward a ``deceptive" Nash equilibrium, while maintaining the stability of the closed-loop system. To extend this insight beyond the duopoly case, in this paper we conduct a comprehensive study of deception mechanisms in N-player oligopoly markets. By leveraging the structure of these games and employing stability techniques for nonlinear dynamical systems, we provide game-theoretic insights into deception and derive specialized results, including stability conditions. These results allow players to systematically adjust their NES dynamics by tuning gains and signal amplitudes, all while ensuring closed-loop stability. Additionally, we introduce novel sufficient conditions to demonstrate that the (practically) stable equilibrium point of the deceptive dynamics corresponds to a true Nash equilibrium of a different game, which we term the ``deceptive game." Our results show that, under the proposed adaptive dynamics with deception, a victim firm may develop a distorted perception of its competitors' product appeal, which could lead to setting suboptimal prices.
Controller Design for Bilinear Neural Feedback Loops
This paper considers a class of bilinear systems with a neural network in the loop. These arise naturally when employing machine learning techniques to approximate general, non-affine in the input, control systems. We propose a controller design framework that combines linear fractional representations and tools from linear parameter varying control to guarantee local exponential stability of a desired equilibrium. The controller is obtained from the solution of linear matrix inequalities, which can be solved offline, making the approach suitable for online applications. The proposed methodology offers tools for stability and robustness analysis of deep neural networks interconnected with dynamical systems.
Enhancing Spatio-Temporal Resolution of Process-Based Life Cycle Analysis with Model-Based Systems Engineering \& Hetero-functional Graph Theory
Life cycle analysis (LCA) has emerged as a vital tool for assessing the environmental impacts of products, processes, and systems throughout their entire lifecycle. It provides a systematic approach to quantifying resource consumption, emissions, and waste, enabling industries, researchers, and policymakers to identify hotspots for sustainability improvements. By providing a comprehensive assessment of systems, from raw material extraction to end-of-life disposal, LCA facilitates the development of environmentally sound strategies, thereby contributing significantly to sustainable engineering and informed decision-making. Despite its strengths and ubiquitous use, life cycle analysis has not been reconciled with the broader literature in model-based systems engineering and analysis, thus hindering its integration into the design of complex systems more generally. This lack of reconciliation poses a significant problem, as it hinders the seamless integration of environmental sustainability into the design and optimization of complex systems. Without alignment between life cycle analysis (LCA) and model-based systems engineering (MBSE), sustainability remains an isolated consideration rather than an inherent part of the system's architecture and design. The original contribution of this paper is twofold. First, the paper reconciles process-based life cycle analysis with the broader literature and vocabulary of model-based systems engineering and hetero-functional graph theory. It ultimately proves that model-based systems engineering and hetero-functional graph theory are a formal generalization of process-based life cycle analysis. Secondly, the paper demonstrates how model-based systems engineering and hetero-functional graph theory may be used to enhance the spatio-temporal resolution of process-based life cycle analysis in a manner that aligns with system design objectives.
comment: 12 pages, 5 figures
MRDust: Wireless Implant Data Uplink & Localization via Magnetic Resonance Image Modulation
Magnetic resonance imaging (MRI) exhibits rich and clinically useful endogenous contrast mechanisms, which can differentiate soft tissues and are sensitive to flow, diffusion, magnetic susceptibility, blood oxygenation level, and more. However, MRI sensitivity is ultimately constrained by Nuclear Magnetic Resonance (NMR) physics, and its spatiotemporal resolution is limited by SNR and spatial encoding. On the other hand, miniaturized implantable sensors offer highly localized physiological information, yet communication and localization can be challenging when multiple implants are present. This paper introduces the MRDust, an active ``contrast agent" that integrates active sensor implants with MRI, enabling the direct encoding of highly localized physiological data into MR images to augment the anatomical images. MRDust employs a micrometer-scale on-chip coil to actively modulate the local magnetic field, enabling MR signal amplitude and phase modulation for digital data transmission. Since MRI inherently captures the anatomical tissue structure, this method has the potential to enable simultaneous data communication, localization, and image registration with multiple implants. This paper presents the underlying physical principles, design tradeoffs, and design methodology for this approach. To validate the concept, a 900 $\times$ 990 $\mu$m$^2$ chip was designed using TSMC 28 nm technology, with an on-chip coil measuring 630 $\mu$m in diameter. The chip was tested with custom hardware in an MR750W GE3T MRI scanner. Successful voxel amplitude modulation is demonstrated with Spin-Echo Echo-Planar-Imaging (SE-EPI) sequence, achieving a contrast-to-noise ratio (CNR) of 25.58 with a power consumption of 130 $\mu$W.
comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (T-BioCAS) for review
Efficient Gaussian Mixture Filters based on Transition Density Approximation
Gaussian mixture filters for nonlinear systems usually rely on severe approximations when calculating mixtures in the prediction and filtering step. Thus, offline approximations of noise densities by Gaussian mixture densities to reduce the approximation error have been proposed. This results in exponential growth in the number of components, requiring ongoing component reduction, which is computationally complex. In this paper, the key idea is to approximate the true transition density by an axis-aligned Gaussian mixture, where two different approaches are derived. These approximations automatically ensure a constant number of components in the posterior densities without the need for explicit reduction. In addition, they allow a trade-off between estimation quality and computational complexity.
comment: Submitted to the conference FUSION 2025
Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures ICML 2025
Learning unknown dynamics under environmental (or external) constraints is fundamental to many fields (e.g., modern robotics), particularly challenging when constraint information is only locally available and uncertain. Existing approaches requiring global constraints or using probabilistic filtering fail to fully exploit the geometric structure inherent in local measurements (by using, e.g., sensors) and constraints. This paper presents a geometric framework unifying measurements, constraints, and dynamics learning through a fiber bundle structure over the state space. This naturally induced geometric structure enables measurement-aware Control Barrier Functions that adapt to local sensing (or measurement) conditions. By integrating Neural ODEs, our framework learns continuous-time dynamics while preserving geometric constraints, with theoretical guarantees of learning convergence and constraint satisfaction dependent on sensing quality. The geometric framework not only enables efficient dynamics learning but also suggests promising directions for integration with reinforcement learning approaches. Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions.
comment: Accepted by ICML 2025
Risk-Sensitive Model Predictive Control for Interaction-Aware Planning -- A Sequential Convexification Algorithm
This paper considers risk-sensitive model predictive control for stochastic systems with a decision-dependent distribution. This class of systems is commonly found in human-robot interaction scenarios. We derive computationally tractable convex upper bounds to both the objective function, and to frequently used penalty terms for collision avoidance, allowing us to efficiently solve the generally nonconvex optimal control problem as a sequence of convex problems. Simulations of a robot navigating a corridor demonstrate the effectiveness and the computational advantage of the proposed approach.
Non-asymptotic Error Analysis of Subspace Identification for Deterministic Systems
The subspace identification method (SIM) has been extensively employed in the identification of discrete-time multiple-input multiple-output (MIMO) linear time-invariant (LTI) systems. This paper focuses on the analysis of perturbation errors for the system matrices in state-space models and the corresponding system poles, under two unified SIMs, based on a single finite-length input/output sample trajectory. Specifically, we derive non-asymptotic upper bounds on these errors, providing a unified perspective across various SIM variants. Furthermore, we prove that SIMs become ill-conditioned for MIMO systems when the state-to-output dimensionality ratio $n/m$ is large, regardless of system parameters. Finally, numerical experiments are conducted to validate the non-asymptotic results and the ill-conditionedness of SIMs.
comment: 33 pages, 3 figures
Iterative Joint Detection of Kalman Filter and Channel Decoder for Sensor-to-Controller Link in Wireless Networked Control Systems
In this letter, we propose an iterative joint detection algorithm of Kalman filter (KF) and channel decoder for the sensor-to-controller link of wireless networked control systems, which utilizes the prior information of control system to improve control and communication performance. In this algorithm, we first use the KF to estimate the probability density of the control system outputs and calculate the prior probability of received signals to assist decoder. Then, the possible outputs of the control system are traversed to update the prior probability in order to implement iterative detection. The simulation results show that the prior information and the iterative structure can reduce the block error rate performance of communications while improving the root mean square error performance of controls.
comment: 5 pages, 4 figures
A Tunable Universal Formula for Safety-Critical Control
Sontag's universal formula is a widely used technique for stabilizing control through control Lyapunov functions. Recently, it has been extended to address safety-critical control by incorporating control barrier functions (CBFs). However, deriving a universal formula that satisfies requirements on essential properties, including safety, smoothness, and robustness against input disturbances, is still an open problem. To address this challenge, this paper introduces a novel solution - a tunable universal formula - by incorporating a (state-dependent) tunable term into Sontag's formula. This tunable term enables the regulation of safety-critical control performances, allowing the attainment of desired properties through a proper selection of tunable terms. Generally, the tunable universal formula can be seen as a controller that improves the quadratic program (QP)-synthesized controllers in terms of robustness and smoothness, while also reducing the conservatism (corresponding to robustness) in Sontag's formula. Furthermore, we extend the tunable universal formula to address safety-critical control problems with norm-bounded input constraints, showcasing its applicability across diverse control scenarios. Finally, we demonstrate the efficacy of our method through a two-link manipulator safe tracking example, investigating the essential properties including safety, smoothness, and robustness against input disturbances under various tunable terms.
Extended Set-based Tasks for Multi-task Execution and Prioritization
The ability of executing multiple tasks simultaneously is an important feature of redundant robotic systems. As a matter of fact, complex behaviors can often be obtained as a result of the execution of several tasks. Moreover, in safety-critical applications, tasks designed to ensure the safety of the robot and its surroundings have to be executed along with other nominal tasks. In such cases, it is also important to prioritize the former over the latter. In this paper, we formalize the definition of extended set-based tasks, i.e., tasks which can be executed by rendering subsets of the task space asymptotically stable or forward invariant using control barrier functions. We propose a formal mathematical representation of such tasks that allows for the execution of more complex and time-varying prioritized stacks of tasks using kinematic and dynamic robot models alike. We present an optimization-based framework which is computationally efficient, accounts for input bounds, and allows for the stable execution of time-varying prioritized stacks of extended set-based tasks. The proposed framework is validated using extensive simulations, quantitative comparisons to the state-of-the-art hierarchical quadratic programming, and experiments with robotic manipulators.
End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic data
In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.
From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance.
comment: This paper was originally submitted as a class project and included the name of a faculty member without prior permission. At the instructor's request, I am withdrawing the paper. The work may be resubmitted in the future after further development and testing
On zero-shot learning in neural state estimation of power distribution systems
This paper addresses the challenge of neural state estimation in power distribution systems. We identified a research gap in the current state of the art, which lies in the inability of models to adapt to changes in the power grid, such as loss of sensors and branch switching, in a zero-shot fashion. Based on the literature, we identified graph neural networks as the most promising class of models for this use case. Our experiments confirm their robustness to some grid changes and also show that a deeper network does not always perform better. We propose data augmentations to improve performance and conduct a comprehensive grid search of different model configurations for common zero-shot learning scenarios.
comment: 13 pages, 2 figures, associated source code available at https://gitlab.com/transense/nse-tl-paper/tree/IARIA
Systems and Control (EESS)
Convex Approximations of Random Constrained Markov Decision Processes
Constrained Markov decision processes (CMDPs) are used as a decision-making framework to study the long-run performance of a stochastic system. It is well-known that a stationary optimal policy of a CMDP problem under discounted cost criterion can be obtained by solving a linear programming problem when running costs and transition probabilities are exactly known. In this paper, we consider a discounted cost CMDP problem where the running costs and transition probabilities are defined using random variables. Consequently, both the objective function and constraints become random. We use chance constraints to model these uncertainties and formulate the uncertain CMDP problem as a joint chance-constrained Markov decision process (JCCMDP). Under random running costs, we assume that the dependency among random constraint vectors is driven by a Gumbel-Hougaard copula. Using standard probability inequalities, we construct convex upper bound approximations of the JCCMDP problem under certain conditions on random running costs. In addition, we propose a linear programming problem whose optimal value gives a lower bound to the optimal value of the JCCMDP problem. When both running costs and transition probabilities are random, we define the latter variables as a sum of their means and random perturbations. Under mild conditions on the random perturbations and random running costs, we construct convex upper and lower bound approximations of the JCCMDP problem. We analyse the quality of the derived bounds through numerical experiments on a queueing control problem for random running costs. For the case when both running costs and transition probabilities are random, we choose randomly generated Markov decision problems called Garnets for numerical experiments.
Input-Power-to-State Stability of Time-Varying Systems
When the state of a system may remain bounded even if both the input amplitude and energy are unbounded, then the state bounds given by the standard input-to-state stability (ISS) and integral-ISS (iISS) properties may provide no useful information. This paper considers an ISS-related concept suitable in such a case: input-power-to-state stability (IPSS). Necessary and sufficient conditions for IPSS are developed for time-varying systems under very mild assumptions on the dynamics. More precisely, it is shown that (a) the existence of a dissipation-form ISS-Lyapunov function implies IPSS, but not necessarily that of an implication-form one, (b) iISS with exponential class-$\KL$ function implies IPSS, and (c) ISS and stronger assumptions on the dynamics imply the existence of a dissipation-form ISS-Lyapunov function and hence IPSS. The latter result is based on a converse Lyapunov theorem for time-varying systems whose dynamics (i.e. state derivative) is not necessarily continuous with respect to time.
comment: Submitted to Automatica
Robust Distribution Network Reconfiguration Using Mapping-based Column-and-Constraint Generation
The integration of intermittent renewable energy sources into distribution networks introduces significant uncertainties and fluctuations, challenging their operational security, stability, and efficiency. This paper considers robust distribution network reconfiguration (RDNR) with renewable generator resizing, modeled as a two-stage robust optimization (RO) problem with decision-dependent uncertainty (DDU). Our model optimizes resizing decisions as the upper bounds of renewable generator outputs, while also optimizing the network topology. We design a mapping-based column-and-constraint generation (C&CG) algorithm to address the computational challenges raised by DDU. Sensitivity analyses further explore the impact of uncertainty set parameters on optimal solutions. Case studies demonstrate the effectiveness of the proposed algorithm in reducing computational complexity while ensuring solution optimality.
Distributed Intelligence in the Computing Continuum with Active Inference
The Computing Continuum (CC) is an emerging Internet-based computing paradigm that spans from local Internet of Things sensors and constrained edge devices to large-scale cloud data centers. Its goal is to orchestrate a vast array of diverse and distributed computing resources to support the next generation of Internet-based applications. However, the distributed, heterogeneous, and dynamic nature of CC platforms demands distributed intelligence for adaptive and resilient service management. This article introduces a distributed stream processing pipeline as a CC use case, where each service is managed by an Active Inference (AIF) agent. These agents collaborate to fulfill service needs specified by SLOiDs, a term we introduce to denote Service Level Objectives that are aware of its deployed devices, meaning that non-functional requirements must consider the characteristics of the hosting device. We demonstrate how AIF agents can be modeled and deployed alongside distributed services to manage them autonomously. Our experiments show that AIF agents achieve over 90% SLOiD fulfillment when using tested transition models, and around 80% when learning the models during deployment. We compare their performance to a multi-agent reinforcement learning algorithm, finding that while both approaches yield similar results, MARL requires extensive training, whereas AIF agents can operate effectively from the start. Additionally, we evaluate the behavior of AIF agents in offloading scenarios, observing a strong capacity for adaptation. Finally, we outline key research directions to advance AIF integration in CC platforms.
2D PZT MEMS Resonant Scanner Using a Three-Mask Process
This work presents the design, simulation, fabrication, and characterization of a novel architectural compact two-dimensional (2D) resonant MEMS scanning mirror actuated by thin-film lead zirconate titanate (PZT). The device employs an innovative mechanically coupled dual-axis architecture fabricated using a three-mask process on an SOI-PZT deposited wafer, significantly reducing system complexity while achieving high performance. The scanner integrates a 1 $\times$ 1.4 mm oval mirror within a 7 $\times$ 4.7 mm die, actuated by PZT thin-film elements optimized for resonant operation at 3.6 kHz (vertical) and 54.2 kHz (horizontal) under 12 V$_{\mathrm{p-p}}$ periodic pulse driving. The system achieves optical scan angles of 4.8$^\circ$ and 11.5$^\circ$ in vertical and horizontal directions, respectively, with quality factors of 750 (vertical) and 1050 (horizontal). These values contribute to high scanning bandwidth-efficiency products of 24.2 deg$\cdot$mm$\cdot$kHz (vertical) and 623 deg$\cdot$mm$\cdot$kHz (horizontal), among the higher values reported for 2D PZT-MEMS scanners. Finite element analysis confirmed minimal stress and mirror deformation, and experimental validation demonstrated excellent agreement with simulation results. This architecture demonstrates the feasibility of high-resolution laser scanning, as required in applications such as OCT, LiDAR, and displays, by achieving performance levels in line with those used in such systems.
comment: The first two listed authors contributed equally. Parviz Zolfaghari and Hakan Urey are the corresponding authors
Incremental Gain Computation and Regulation of Discrete-time Positive Luré Systems using Linear Programming
This work approaches the problem of computing incremental $\ell_1$ and $\ell_\infty$ gains for discrete-time positive systems in \lure feedback with static memoryless nonlinearities, and regulating the $\ell_\infty$ gain through the design of a state-feedback controller. Finite incremental gains provide a quantitative measure of robustness for trajectories, and will ensure that all pairs of trajectories will converge to a fixed point or will diverge together in the absence of an applied input. Upper-bounds on these incremental gains can be computed through linear programming. Computation and regulation of the $\ell_1$ and $\ell_\infty$ incremental gains are verified by numerical examples.
comment: 13 pages, 4 images
Damping LFOs: Grid Following with Power Oscillation Damping vs. Grid Forming vs. PSS
Low-frequency oscillations (LFOs) present a significant challenge to the stability and reliability of power systems, especially in grids with a high penetration of renewable energy sources. Traditional grid-following (GFL) inverters have proven less effective in damping such oscillations. This paper presents a GFL-power plant controller with an auxiliary power oscillation damping control for damping LFOs. This approach is compared with a traditional power system stabilizer (PSS) for a two-area power system. Next, the research is extended by deploying grid forming (GFM) controls, which by actively controlling the voltage and frequency dynamics emulate the behavior of traditional synchronous generators. The paper analyzes two GFM control strategies: virtual synchronous machine (VSM) and droop control, and demonstrates their effectiveness in damping LFOs in the test system. The simulation results reveal that the performance of the proposed GFM-VSM rivals that of the PSS and is better than the GFL-power oscillation damper.
A Causation-Based Framework for Pricing and Cost Allocation of Energy, Reserves, and Transmission in Modern Power Systems
The increasing vulnerability of power systems has heightened the need for operating reserves to manage contingencies such as generator outages, line failures, and sudden load variations. Unlike energy costs, driven by consumer demand, operating reserve costs arise from addressing the most critical credible contingencies - prompting the question: how should these costs be allocated through efficient pricing mechanisms? As an alternative to previously reported schemes, this paper presents a new causation-based pricing framework for electricity markets based on contingency-constrained energy and reserve scheduling models. Major salient features include a novel security charge mechanism along with the explicit definition of prices for up-spinning reserves, down-spinning reserves, and transmission services. These features ensure more comprehensive and efficient cost-reflective market operations. Moreover, the proposed nodal pricing scheme yields revenue adequacy and neutrality while promoting reliability incentives for generators based on the cost-causation principle. An additional salient aspect of the proposed framework is the economic incentive for transmission assets, which are remunerated based on their use to deliver energy and reserves across all contingency states. Numerical results from two case studies illustrate the performance of the proposed pricing scheme.
Humanoid Loco-Manipulations Pattern Generation and Stabilization Control
In order for a humanoid robot to perform loco-manipulation such as moving an object while walking, it is necessary to account for sustained or alternating external forces other than ground-feet reaction, resulting from humanoid-object contact interactions. In this letter, we propose a bipedal control strategy for humanoid loco-manipulation that can cope with such external forces. First, the basic formulas of the bipedal dynamics, i.e., linear inverted pendulum mode and divergent component of motion, are derived, taking into account the effects of external manipulation forces. Then, we propose a pattern generator to plan center of mass trajectories consistent with the reference trajectory of the manipulation forces, and a stabilizer to compensate for the error between desired and actual manipulation forces. The effectiveness of our controller is assessed both in simulation and loco-manipulation experiments with real humanoid robots.
Deception in Oligopoly Games via Adaptive Nash Seeking Systems
In the theory of multi-agent systems, deception refers to the strategic manipulation of information to influence the behavior of other agents, ultimately altering the long-term dynamics of the entire system. Recently, this concept has been examined in the context of model-free Nash equilibrium seeking (NES) algorithms for noncooperative games. Specifically, it was demonstrated that players can exploit knowledge of other players' exploration signals to drive the system toward a ``deceptive" Nash equilibrium, while maintaining the stability of the closed-loop system. To extend this insight beyond the duopoly case, in this paper we conduct a comprehensive study of deception mechanisms in N-player oligopoly markets. By leveraging the structure of these games and employing stability techniques for nonlinear dynamical systems, we provide game-theoretic insights into deception and derive specialized results, including stability conditions. These results allow players to systematically adjust their NES dynamics by tuning gains and signal amplitudes, all while ensuring closed-loop stability. Additionally, we introduce novel sufficient conditions to demonstrate that the (practically) stable equilibrium point of the deceptive dynamics corresponds to a true Nash equilibrium of a different game, which we term the ``deceptive game." Our results show that, under the proposed adaptive dynamics with deception, a victim firm may develop a distorted perception of its competitors' product appeal, which could lead to setting suboptimal prices.
Controller Design for Bilinear Neural Feedback Loops
This paper considers a class of bilinear systems with a neural network in the loop. These arise naturally when employing machine learning techniques to approximate general, non-affine in the input, control systems. We propose a controller design framework that combines linear fractional representations and tools from linear parameter varying control to guarantee local exponential stability of a desired equilibrium. The controller is obtained from the solution of linear matrix inequalities, which can be solved offline, making the approach suitable for online applications. The proposed methodology offers tools for stability and robustness analysis of deep neural networks interconnected with dynamical systems.
Enhancing Spatio-Temporal Resolution of Process-Based Life Cycle Analysis with Model-Based Systems Engineering \& Hetero-functional Graph Theory
Life cycle analysis (LCA) has emerged as a vital tool for assessing the environmental impacts of products, processes, and systems throughout their entire lifecycle. It provides a systematic approach to quantifying resource consumption, emissions, and waste, enabling industries, researchers, and policymakers to identify hotspots for sustainability improvements. By providing a comprehensive assessment of systems, from raw material extraction to end-of-life disposal, LCA facilitates the development of environmentally sound strategies, thereby contributing significantly to sustainable engineering and informed decision-making. Despite its strengths and ubiquitous use, life cycle analysis has not been reconciled with the broader literature in model-based systems engineering and analysis, thus hindering its integration into the design of complex systems more generally. This lack of reconciliation poses a significant problem, as it hinders the seamless integration of environmental sustainability into the design and optimization of complex systems. Without alignment between life cycle analysis (LCA) and model-based systems engineering (MBSE), sustainability remains an isolated consideration rather than an inherent part of the system's architecture and design. The original contribution of this paper is twofold. First, the paper reconciles process-based life cycle analysis with the broader literature and vocabulary of model-based systems engineering and hetero-functional graph theory. It ultimately proves that model-based systems engineering and hetero-functional graph theory are a formal generalization of process-based life cycle analysis. Secondly, the paper demonstrates how model-based systems engineering and hetero-functional graph theory may be used to enhance the spatio-temporal resolution of process-based life cycle analysis in a manner that aligns with system design objectives.
comment: 12 pages, 5 figures
MRDust: Wireless Implant Data Uplink & Localization via Magnetic Resonance Image Modulation
Magnetic resonance imaging (MRI) exhibits rich and clinically useful endogenous contrast mechanisms, which can differentiate soft tissues and are sensitive to flow, diffusion, magnetic susceptibility, blood oxygenation level, and more. However, MRI sensitivity is ultimately constrained by Nuclear Magnetic Resonance (NMR) physics, and its spatiotemporal resolution is limited by SNR and spatial encoding. On the other hand, miniaturized implantable sensors offer highly localized physiological information, yet communication and localization can be challenging when multiple implants are present. This paper introduces the MRDust, an active ``contrast agent" that integrates active sensor implants with MRI, enabling the direct encoding of highly localized physiological data into MR images to augment the anatomical images. MRDust employs a micrometer-scale on-chip coil to actively modulate the local magnetic field, enabling MR signal amplitude and phase modulation for digital data transmission. Since MRI inherently captures the anatomical tissue structure, this method has the potential to enable simultaneous data communication, localization, and image registration with multiple implants. This paper presents the underlying physical principles, design tradeoffs, and design methodology for this approach. To validate the concept, a 900 $\times$ 990 $\mu$m$^2$ chip was designed using TSMC 28 nm technology, with an on-chip coil measuring 630 $\mu$m in diameter. The chip was tested with custom hardware in an MR750W GE3T MRI scanner. Successful voxel amplitude modulation is demonstrated with Spin-Echo Echo-Planar-Imaging (SE-EPI) sequence, achieving a contrast-to-noise ratio (CNR) of 25.58 with a power consumption of 130 $\mu$W.
comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (T-BioCAS) for review
Efficient Gaussian Mixture Filters based on Transition Density Approximation
Gaussian mixture filters for nonlinear systems usually rely on severe approximations when calculating mixtures in the prediction and filtering step. Thus, offline approximations of noise densities by Gaussian mixture densities to reduce the approximation error have been proposed. This results in exponential growth in the number of components, requiring ongoing component reduction, which is computationally complex. In this paper, the key idea is to approximate the true transition density by an axis-aligned Gaussian mixture, where two different approaches are derived. These approximations automatically ensure a constant number of components in the posterior densities without the need for explicit reduction. In addition, they allow a trade-off between estimation quality and computational complexity.
comment: Submitted to the conference FUSION 2025
Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures ICML 2025
Learning unknown dynamics under environmental (or external) constraints is fundamental to many fields (e.g., modern robotics), particularly challenging when constraint information is only locally available and uncertain. Existing approaches requiring global constraints or using probabilistic filtering fail to fully exploit the geometric structure inherent in local measurements (by using, e.g., sensors) and constraints. This paper presents a geometric framework unifying measurements, constraints, and dynamics learning through a fiber bundle structure over the state space. This naturally induced geometric structure enables measurement-aware Control Barrier Functions that adapt to local sensing (or measurement) conditions. By integrating Neural ODEs, our framework learns continuous-time dynamics while preserving geometric constraints, with theoretical guarantees of learning convergence and constraint satisfaction dependent on sensing quality. The geometric framework not only enables efficient dynamics learning but also suggests promising directions for integration with reinforcement learning approaches. Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions.
comment: Accepted by ICML 2025
Risk-Sensitive Model Predictive Control for Interaction-Aware Planning -- A Sequential Convexification Algorithm
This paper considers risk-sensitive model predictive control for stochastic systems with a decision-dependent distribution. This class of systems is commonly found in human-robot interaction scenarios. We derive computationally tractable convex upper bounds to both the objective function, and to frequently used penalty terms for collision avoidance, allowing us to efficiently solve the generally nonconvex optimal control problem as a sequence of convex problems. Simulations of a robot navigating a corridor demonstrate the effectiveness and the computational advantage of the proposed approach.
Non-asymptotic Error Analysis of Subspace Identification for Deterministic Systems
The subspace identification method (SIM) has been extensively employed in the identification of discrete-time multiple-input multiple-output (MIMO) linear time-invariant (LTI) systems. This paper focuses on the analysis of perturbation errors for the system matrices in state-space models and the corresponding system poles, under two unified SIMs, based on a single finite-length input/output sample trajectory. Specifically, we derive non-asymptotic upper bounds on these errors, providing a unified perspective across various SIM variants. Furthermore, we prove that SIMs become ill-conditioned for MIMO systems when the state-to-output dimensionality ratio $n/m$ is large, regardless of system parameters. Finally, numerical experiments are conducted to validate the non-asymptotic results and the ill-conditionedness of SIMs.
comment: 33 pages, 3 figures
Iterative Joint Detection of Kalman Filter and Channel Decoder for Sensor-to-Controller Link in Wireless Networked Control Systems
In this letter, we propose an iterative joint detection algorithm of Kalman filter (KF) and channel decoder for the sensor-to-controller link of wireless networked control systems, which utilizes the prior information of control system to improve control and communication performance. In this algorithm, we first use the KF to estimate the probability density of the control system outputs and calculate the prior probability of received signals to assist decoder. Then, the possible outputs of the control system are traversed to update the prior probability in order to implement iterative detection. The simulation results show that the prior information and the iterative structure can reduce the block error rate performance of communications while improving the root mean square error performance of controls.
comment: 5 pages, 4 figures
A Tunable Universal Formula for Safety-Critical Control
Sontag's universal formula is a widely used technique for stabilizing control through control Lyapunov functions. Recently, it has been extended to address safety-critical control by incorporating control barrier functions (CBFs). However, deriving a universal formula that satisfies requirements on essential properties, including safety, smoothness, and robustness against input disturbances, is still an open problem. To address this challenge, this paper introduces a novel solution - a tunable universal formula - by incorporating a (state-dependent) tunable term into Sontag's formula. This tunable term enables the regulation of safety-critical control performances, allowing the attainment of desired properties through a proper selection of tunable terms. Generally, the tunable universal formula can be seen as a controller that improves the quadratic program (QP)-synthesized controllers in terms of robustness and smoothness, while also reducing the conservatism (corresponding to robustness) in Sontag's formula. Furthermore, we extend the tunable universal formula to address safety-critical control problems with norm-bounded input constraints, showcasing its applicability across diverse control scenarios. Finally, we demonstrate the efficacy of our method through a two-link manipulator safe tracking example, investigating the essential properties including safety, smoothness, and robustness against input disturbances under various tunable terms.
Extended Set-based Tasks for Multi-task Execution and Prioritization
The ability of executing multiple tasks simultaneously is an important feature of redundant robotic systems. As a matter of fact, complex behaviors can often be obtained as a result of the execution of several tasks. Moreover, in safety-critical applications, tasks designed to ensure the safety of the robot and its surroundings have to be executed along with other nominal tasks. In such cases, it is also important to prioritize the former over the latter. In this paper, we formalize the definition of extended set-based tasks, i.e., tasks which can be executed by rendering subsets of the task space asymptotically stable or forward invariant using control barrier functions. We propose a formal mathematical representation of such tasks that allows for the execution of more complex and time-varying prioritized stacks of tasks using kinematic and dynamic robot models alike. We present an optimization-based framework which is computationally efficient, accounts for input bounds, and allows for the stable execution of time-varying prioritized stacks of extended set-based tasks. The proposed framework is validated using extensive simulations, quantitative comparisons to the state-of-the-art hierarchical quadratic programming, and experiments with robotic manipulators.
End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic data
In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.
From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance.
comment: This paper was originally submitted as a class project and included the name of a faculty member without prior permission. At the instructor's request, I am withdrawing the paper. The work may be resubmitted in the future after further development and testing
On zero-shot learning in neural state estimation of power distribution systems
This paper addresses the challenge of neural state estimation in power distribution systems. We identified a research gap in the current state of the art, which lies in the inability of models to adapt to changes in the power grid, such as loss of sensors and branch switching, in a zero-shot fashion. Based on the literature, we identified graph neural networks as the most promising class of models for this use case. Our experiments confirm their robustness to some grid changes and also show that a deeper network does not always perform better. We propose data augmentations to improve performance and conduct a comprehensive grid search of different model configurations for common zero-shot learning scenarios.
comment: 13 pages, 2 figures, associated source code available at https://gitlab.com/transense/nse-tl-paper/tree/IARIA
Robotics
AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning SIGGRAPH 2025
Reinforcement learning (RL) has significantly advanced the control of physics-based and robotic characters that track kinematic reference motion. However, methods typically rely on a weighted sum of conflicting reward functions, requiring extensive tuning to achieve a desired behavior. Due to the computational cost of RL, this iterative process is a tedious, time-intensive task. Furthermore, for robotics applications, the weights need to be chosen such that the policy performs well in the real world, despite inevitable sim-to-real gaps. To address these challenges, we propose a multi-objective reinforcement learning framework that trains a single policy conditioned on a set of weights, spanning the Pareto front of reward trade-offs. Within this framework, weights can be selected and tuned after training, significantly speeding up iteration time. We demonstrate how this improved workflow can be used to perform highly dynamic motions with a robot character. Moreover, we explore how weight-conditioned policies can be leveraged in hierarchical settings, using a high-level policy to dynamically select weights according to the current task. We show that the multi-objective policy encodes a diverse spectrum of behaviors, facilitating efficient adaptation to novel tasks.
comment: SIGGRAPH 2025
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions of parameters, presenting an obstacle to real-time inference, and operate on discrete tokens rather than the continuous-valued outputs that are required for controlling robots. To address this challenge, recent VLA models have used specialized modules for efficient continuous control, such as action experts or continuous output heads, which typically require adding new untrained parameters to the pretrained VLM backbone. While these modules improve real-time and control capabilities, it remains an open question whether they preserve or degrade the semantic knowledge contained in the pretrained VLM, and what effect they have on the VLA training dynamics. In this paper, we study this question in the context of VLAs that include a continuous diffusion or flow matching action expert, showing that naively including such experts significantly harms both training speed and knowledge transfer. We provide an extensive analysis of various design choices, their impact on performance and knowledge transfer, and propose a technique for insulating the VLM backbone during VLA training that mitigates this issue. Videos are available at https://pi.website/research/knowledge_insulation.
Mobi-$π$: Mobilizing Your Robot Learning Policy
Learned visuomotor policies are capable of performing increasingly complex manipulation tasks. However, most of these policies are trained on data collected from limited robot positions and camera viewpoints. This leads to poor generalization to novel robot positions, which limits the use of these policies on mobile platforms, especially for precise tasks like pressing buttons or turning faucets. In this work, we formulate the policy mobilization problem: find a mobile robot base pose in a novel environment that is in distribution with respect to a manipulation policy trained on a limited set of camera viewpoints. Compared to retraining the policy itself to be more robust to unseen robot base pose initializations, policy mobilization decouples navigation from manipulation and thus does not require additional demonstrations. Crucially, this problem formulation complements existing efforts to improve manipulation policy robustness to novel viewpoints and remains compatible with them. To study policy mobilization, we introduce the Mobi-$\pi$ framework, which includes: (1) metrics that quantify the difficulty of mobilizing a given policy, (2) a suite of simulated mobile manipulation tasks based on RoboCasa to evaluate policy mobilization, (3) visualization tools for analysis, and (4) several baseline methods. We also propose a novel approach that bridges navigation and manipulation by optimizing the robot's base pose to align with an in-distribution base pose for a learned policy. Our approach utilizes 3D Gaussian Splatting for novel view synthesis, a score function to evaluate pose suitability, and sampling-based optimization to identify optimal robot poses. We show that our approach outperforms baselines in both simulation and real-world environments, demonstrating its effectiveness for policy mobilization.
comment: Project website: https://mobipi.github.io/
Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
Controllable trajectory generation guided by high-level semantic decisions, termed meta-actions, is crucial for autonomous driving systems. A significant limitation of existing frameworks is their reliance on invariant meta-actions assigned over fixed future time intervals, causing temporal misalignment with the actual behavior trajectories. This misalignment leads to irrelevant associations between the prescribed meta-actions and the resulting trajectories, disrupting task coherence and limiting model performance. To address this challenge, we introduce Autoregressive Meta-Actions, an approach integrated into autoregressive trajectory generation frameworks that provides a unified and precise definition for meta-action-conditioned trajectory prediction. Specifically, We decompose traditional long-interval meta-actions into frame-level meta-actions, enabling a sequential interplay between autoregressive meta-action prediction and meta-action-conditioned trajectory generation. This decomposition ensures strict alignment between each trajectory segment and its corresponding meta-action, achieving a consistent and unified task formulation across the entire trajectory span and significantly reducing complexity. Moreover, we propose a staged pre-training process to decouple the learning of basic motion dynamics from the integration of high-level decision control, which offers flexibility, stability, and modularity. Experimental results validate our framework's effectiveness, demonstrating improved trajectory adaptivity and responsiveness to dynamic decision-making scenarios. We provide the video document and dataset, which are available at https://arma-traj.github.io/.
Collaborative Last-Mile Delivery: A Multi-Platform Vehicle Routing Problem With En-route Charging
The rapid growth of e-commerce and the increasing demand for timely, cost-effective last-mile delivery have increased interest in collaborative logistics. This research introduces a novel collaborative synchronized multi-platform vehicle routing problem with drones and robots (VRP-DR), where a fleet of $\mathcal{M}$ trucks, $\mathcal{N}$ drones and $\mathcal{K}$ robots, cooperatively delivers parcels. Trucks serve as mobile platforms, enabling the launching, retrieving, and en-route charging of drones and robots, thereby addressing critical limitations such as restricted payload capacities, limited range, and battery constraints. The VRP-DR incorporates five realistic features: (1) multi-visit service per trip, (2) multi-trip operations, (3) flexible docking, allowing returns to the same or different trucks (4) cyclic and acyclic operations, enabling return to the same or different nodes; and (5) en-route charging, enabling drones and robots to recharge while being transported on the truck, maximizing operational efficiency by utilizing idle transit time. The VRP-DR is formulated as a mixed-integer linear program (MILP) to minimize both operational costs and makespan. To overcome the computational challenges of solving large-scale instances, a scalable heuristic algorithm, FINDER (Flexible INtegrated Delivery with Energy Recharge), is developed, to provide efficient, near-optimal solutions. Numerical experiments across various instance sizes evaluate the performance of the MILP and heuristic approaches in terms of solution quality and computation time. The results demonstrate significant time savings of the combined delivery mode over the truck-only mode and substantial cost reductions from enabling multi-visits. The study also provides insights into the effects of en-route charging, docking flexibility, drone count, speed, and payload capacity on system performance.
Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms
Small Uncrewed Aerial Systems (sUAS) are increasingly deployed as autonomous swarms in search-and-rescue and other disaster-response scenarios. In these settings, they use computer vision (CV) to detect objects of interest and autonomously adapt their missions. However, traditional CV systems often struggle to recognize unfamiliar objects in open-world environments or to infer their relevance for mission planning. To address this, we incorporate large language models (LLMs) to reason about detected objects and their implications. While LLMs can offer valuable insights, they are also prone to hallucinations and may produce incorrect, misleading, or unsafe recommendations. To ensure safe and sensible decision-making under uncertainty, high-level decisions must be governed by cognitive guardrails. This article presents the design, simulation, and real-world integration of these guardrails for sUAS swarms in search-and-rescue missions.
comment: 16 pages, 8 figures
A Robot-Assisted Approach to Small Talk Training for Adults with ASD
From dating to job interviews, making new friends or simply chatting with the cashier at checkout, engaging in small talk is a vital, everyday social skill. For adults with Autism Spectrum Disorder (ASD), small talk can be particularly challenging, yet it is essential for social integration, building relationships, and accessing professional opportunities. In this study, we present our development and evaluation of an in-home autonomous robot system that allows users to practice small talk. Results from the week-long study show that adults with ASD enjoyed the training, made notable progress in initiating conversations and improving eye contact, and viewed the system as a valuable tool for enhancing their conversational skills.
comment: Accepted for publication in Robotics: Science and Systems (RSS) 2025, 14 pages, 4 figures,
Humanoid Loco-manipulation Planning based on Graph Search and Reachability Maps
In this letter, we propose an efficient and highly versatile loco-manipulation planning for humanoid robots. Loco-manipulation planning is a key technological brick enabling humanoid robots to autonomously perform object transportation by manipulating them. We formulate planning of the alternation and sequencing of footsteps and grasps as a graph search problem with a new transition model that allows for a flexible representation of loco-manipulation. Our transition model is quickly evaluated by relocating and switching the reachability maps depending on the motion of both the robot and object. We evaluate our approach by applying it to loco-manipulation use-cases, such as a bobbin rolling operation with regrasping, where the motion is automatically planned by our framework.
Optimization-based Posture Generation for Whole-body Contact Motion by Contact Point Search on the Body Surface
Whole-body contact is an effective strategy for improving the stability and efficiency of the motion of robots. For robots to automatically perform such motions, we propose a posture generation method that employs all available surfaces of the robot links. By representing the contact point on the body surface by two-dimensional configuration variables, the joint positions and contact points are simultaneously determined through a gradient-based optimization. By generating motions with the proposed method, we present experiments in which robots manipulate objects effectively utilizing whole-body contact.
Centroidal Trajectory Generation and Stabilization based on Preview Control for Humanoid Multi-contact Motion
Multi-contact motion is important for humanoid robots to work in various environments. We propose a centroidal online trajectory generation and stabilization control for humanoid dynamic multi-contact motion. The proposed method features the drastic reduction of the computational cost by using preview control instead of the conventional model predictive control that considers the constraints of all sample times. By combining preview control with centroidal state feedback for robustness to disturbances and wrench distribution for satisfying contact constraints, we show that the robot can stably perform a variety of multi-contact motions through simulation experiments.
Long Duration Inspection of GNSS-Denied Environments with a Tethered UAV-UGV Marsupial System
Unmanned Aerial Vehicles (UAVs) have become essential tools in inspection and emergency response operations due to their high maneuverability and ability to access hard-to-reach areas. However, their limited battery life significantly restricts their use in long-duration missions. This paper presents a novel tethered marsupial robotic system composed of a UAV and an Unmanned Ground Vehicle (UGV), specifically designed for autonomous, long-duration inspection tasks in Global Navigation Satellite System (GNSS)-denied environments. The system extends the UAV's operational time by supplying power through a tether connected to high-capacity battery packs carried by the UGV. We detail the hardware architecture based on off-the-shelf components to ensure replicability and describe our full-stack software framework, which is composed of open-source components and built upon the Robot Operating System (ROS). The proposed software architecture enables precise localization using a Direct LiDAR Localization (DLL) method and ensures safe path planning and coordinated trajectory tracking for the integrated UGV-tether-UAV system. We validate the system through three field experiments: (1) a manual flight endurance test to estimate the operational duration, (2) an autonomous navigation test, and (3) an inspection mission to demonstrate autonomous inspection capabilities. Experimental results confirm the robustness and autonomy of the system, its capacity to operate in GNSS-denied environments, and its potential for long-endurance, autonomous inspection and monitoring tasks.
comment: 30 pages, 15 figures, 3 tables, 1 algorithm. Submitted to Journal of Intelligent & Robotic Systems
Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents
Long-horizon robotic manipulation poses significant challenges for autonomous systems, requiring extended reasoning, precise execution, and robust error recovery across complex sequential tasks. Current approaches, whether based on static planning or end-to-end visuomotor policies, suffer from error accumulation and lack effective verification mechanisms during execution, limiting their reliability in real-world scenarios. We present Agentic Robot, a brain-inspired framework that addresses these limitations through Standardized Action Procedures (SAP)--a novel coordination protocol governing component interactions throughout manipulation tasks. Drawing inspiration from Standardized Operating Procedures (SOPs) in human organizations, SAP establishes structured workflows for planning, execution, and verification phases. Our architecture comprises three specialized components: (1) a large reasoning model that decomposes high-level instructions into semantically coherent subgoals, (2) a vision-language-action executor that generates continuous control commands from real-time visual inputs, and (3) a temporal verifier that enables autonomous progression and error recovery through introspective assessment. This SAP-driven closed-loop design supports dynamic self-verification without external supervision. On the LIBERO benchmark, Agentic Robot achieves state-of-the-art performance with an average success rate of 79.6\%, outperforming SpatialVLA by 6.1\% and OpenVLA by 7.4\% on long-horizon tasks. These results demonstrate that SAP-driven coordination between specialized components enhances both performance and interpretability in sequential manipulation, suggesting significant potential for reliable autonomous systems. Project Github: https://agentic-robot.github.io.
comment: 18 pages, 8 figures
MEF-Explore: Communication-Constrained Multi-Robot Entropy-Field-Based Exploration
Collaborative multiple robots for unknown environment exploration have become mainstream due to their remarkable performance and efficiency. However, most existing methods assume perfect robots' communication during exploration, which is unattainable in real-world settings. Though there have been recent works aiming to tackle communication-constrained situations, substantial room for advancement remains for both information-sharing and exploration strategy aspects. In this paper, we propose a Communication-Constrained Multi-Robot Entropy-Field-Based Exploration (MEF-Explore). The first module of the proposed method is the two-layer inter-robot communication-aware information-sharing strategy. A dynamic graph is used to represent a multi-robot network and to determine communication based on whether it is low-speed or high-speed. Specifically, low-speed communication, which is always accessible between every robot, can only be used to share their current positions. If robots are within a certain range, high-speed communication will be available for inter-robot map merging. The second module is the entropy-field-based exploration strategy. Particularly, robots explore the unknown area distributedly according to the novel forms constructed to evaluate the entropies of frontiers and robots. These entropies can also trigger implicit robot rendezvous to enhance inter-robot map merging if feasible. In addition, we include the duration-adaptive goal-assigning module to manage robots' goal assignment. The simulation results demonstrate that our MEF-Explore surpasses the existing ones regarding exploration time and success rate in all scenarios. For real-world experiments, our method leads to a 21.32% faster exploration time and a 16.67% higher success rate compared to the baseline.
comment: This paper has been accepted for publication in IEEE Transactions on Automation Science and Engineering
VLM-RRT: Vision Language Model Guided RRT Search for Autonomous UAV Navigation
Path planning is a fundamental capability of autonomous Unmanned Aerial Vehicles (UAVs), enabling them to efficiently navigate toward a target region or explore complex environments while avoiding obstacles. Traditional pathplanning methods, such as Rapidly-exploring Random Trees (RRT), have proven effective but often encounter significant challenges. These include high search space complexity, suboptimal path quality, and slow convergence, issues that are particularly problematic in high-stakes applications like disaster response, where rapid and efficient planning is critical. To address these limitations and enhance path-planning efficiency, we propose Vision Language Model RRT (VLM-RRT), a hybrid approach that integrates the pattern recognition capabilities of Vision Language Models (VLMs) with the path-planning strengths of RRT. By leveraging VLMs to provide initial directional guidance based on environmental snapshots, our method biases sampling toward regions more likely to contain feasible paths, significantly improving sampling efficiency and path quality. Extensive quantitative and qualitative experiments with various state-of-the-art VLMs demonstrate the effectiveness of this proposed approach.
UPP: Unified Path Planner with Adaptive Safety and Optimality
We are surrounded by robots helping us perform complex tasks. Robots have a wide range of applications, from industrial automation to personalized assistance. However, with great technological innovation come significant challenges. One of the major challenges in robotics is path planning. Despite advancements such as graph search, sampling, and potential field methods, most path planning algorithms focus either on optimality or on safety. Very little research addresses both simultaneously. We propose a Unified Path Planner (UPP) that uses modified heuristics and a dynamic safety cost function to balance safety and optimality. The level of safety can be adjusted via tunable parameters, trading off against computational complexity. We demonstrate the planner's performance in simulations, showing how parameter variation affects results. UPP is compared with various traditional and safe-optimal planning algorithms across different scenarios. We also validate it on a TurtleBot, where the robot successfully finds safe and sub-optimal paths.
comment: 8 pages,11 figures
TrackVLA: Embodied Visual Tracking in the Wild
Embodied visual tracking is a fundamental skill in Embodied AI, enabling an agent to follow a specific target in dynamic environments using only egocentric vision. This task is inherently challenging as it requires both accurate target recognition and effective trajectory planning under conditions of severe occlusion and high scene dynamics. Existing approaches typically address this challenge through a modular separation of recognition and planning. In this work, we propose TrackVLA, a Vision-Language-Action (VLA) model that learns the synergy between object recognition and trajectory planning. Leveraging a shared LLM backbone, we employ a language modeling head for recognition and an anchor-based diffusion model for trajectory planning. To train TrackVLA, we construct an Embodied Visual Tracking Benchmark (EVT-Bench) and collect diverse difficulty levels of recognition samples, resulting in a dataset of 1.7 million samples. Through extensive experiments in both synthetic and real-world environments, TrackVLA demonstrates SOTA performance and strong generalizability. It significantly outperforms existing methods on public benchmarks in a zero-shot manner while remaining robust to high dynamics and occlusion in real-world scenarios at 10 FPS inference speed. Our project page is: https://pku-epic.github.io/TrackVLA-web.
LocoTouch: Learning Dexterous Quadrupedal Transport with Tactile Sensing
Quadrupedal robots have demonstrated remarkable agility and robustness in traversing complex terrains. However, they remain limited in performing object interactions that require sustained contact. In this work, we present LocoTouch, a system that equips quadrupedal robots with tactile sensing to address a challenging task in this category: long-distance transport of unsecured cylindrical objects, which typically requires custom mounting mechanisms to maintain stability. For efficient large-area tactile sensing, we design a high-density distributed tactile sensor array that covers the entire back of the robot. To effectively leverage tactile feedback for locomotion control, we develop a simulation environment with high-fidelity tactile signals, and train tactile-aware transport policies using a two-stage learning pipeline. Furthermore, we design a novel reward function to promote stable, symmetric, and frequency-adaptive locomotion gaits. After training in simulation, LocoTouch transfers zero-shot to the real world, reliably balancing and transporting a wide range of unsecured, cylindrical everyday objects with broadly varying sizes and weights. Thanks to the responsiveness of the tactile sensor and the adaptive gait reward, LocoTouch can robustly balance objects with slippery surfaces over long distances, or even under severe external perturbations.
comment: Project page: https://linchangyi1.github.io/LocoTouch
Eye-tracking-Driven Shared Control for Robotic Arms:Wizard of Oz Studies to Assess Design Choices
Advances in eye-tracking control for assistive robotic arms provide intuitive interaction opportunities for people with physical disabilities. Shared control has gained interest in recent years by improving user satisfaction through partial automation of robot control. We present an eye-tracking-guided shared control design based on insights from state-of-the-art literature. A Wizard of Oz setup was used in which automation was simulated by an experimenter to evaluate the concept without requiring full implementation. This approach allowed for rapid exploration of user needs and expectations to inform future iterations. Two studies were conducted to assess user experience, identify design challenges, and find improvements to ensure usability and accessibility. The first study involved people with disabilities by providing a survey, and the second study used the Wizard of Oz design in person to gain technical insights, leading to a comprehensive picture of findings.
comment: Preprint, 23 pages
System Identification for Virtual Sensor-Based Model Predictive Control: Application to a 2-DoF Direct-Drive Robotic Arm
Nonlinear Model Predictive Control (NMPC) offers a powerful approach for controlling complex nonlinear systems, yet faces two key challenges. First, accurately modeling nonlinear dynamics remains difficult. Second, variables directly related to control objectives often cannot be directly measured during operation. Although high-cost sensors can acquire these variables during model development, their use in practical deployment is typically infeasible. To overcome these limitations, we propose a Predictive Virtual Sensor Identification (PVSID) framework that leverages temporary high-cost sensors during the modeling phase to create virtual sensors for NMPC implementation. We validate PVSID on a Two-Degree-of-Freedom (2-DoF) direct-drive robotic arm with complex joint interactions, capturing tip position via motion capture during modeling and utilize an Inertial Measurement Unit (IMU) in NMPC. Experimental results show our NMPC with identified virtual sensors achieves precise tip trajectory tracking without requiring the motion capture system during operation. PVSID offers a practical solution for implementing optimal control in nonlinear systems where the measurement of key variables is constrained by cost or operational limitations.
comment: 6 pages, 5 figures, submitted to L-CSS with CDC 2025 option
Redundancy Parameterization of the ABB YuMi Robot Arm
The ABB YuMi is a 7-DOF collaborative robot arm with a complex, redundant kinematic structure. Path planning for the YuMi is challenging, especially with joint limits considered. The redundant degree of freedom is parameterized by the Shoulder-Elbow-Wrist (SEW) angle, called the arm angle by ABB, but the exact definition must be known for path planning outside the RobotStudio simulator. We provide the first complete and validated definition of the SEW angle used for the YuMi. It follows the conventional SEW angle formulation with the shoulder-elbow direction chosen to be the direction of the fourth joint axis. Our definition also specifies the shoulder location, making it compatible with any choice of reference vector. A previous attempt to define the SEW angle exists in the literature, but it is incomplete and deviates from the behavior observed in RobotStudio. Because our formulation fits within the general SEW angle framework, we also obtain the expression for the SEW angle Jacobian and complete numerical conditions for all algorithmic singularities. Finally, we demonstrate using IK-Geo, our inverse kinematics (IK) solver based on subproblem decomposition, to find all IK solutions using 2D search. Code examples are available in a publicly accessible repository.
comment: 8 pages, 5 figures
A Constructed Response: Designing and Choreographing Robot Arm Movements in Collaborative Dance Improvisation
Dancers often prototype movements themselves or with each other during improvisation and choreography. How are these interactions altered when physically manipulable technologies are introduced into the creative process? To understand how dancers design and improvise movements while working with instruments capable of non-humanoid movements, we engaged dancers in workshops to co-create movements with a robot arm in one-human-to-one-robot and three-human-to-one-robot settings. We found that dancers produced more fluid movements in one-to-one scenarios, experiencing a stronger sense of connection and presence with the robot as a co-dancer. In three-to-one scenarios, the dancers divided their attention between the human dancers and the robot, resulting in increased perceived use of space and more stop-and-go movements, perceiving the robot as part of the stage background. This work highlights how technologies can drive creativity in movement artists adapting to new ways of working with physical instruments, contributing design insights supporting artistic collaborations with non-humanoid agents.
Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration
Object-Goal Navigation (OGN) remains challenging in real-world, multi-floor environments and under open-vocabulary object descriptions. We observe that most episodes in widely used benchmarks such as HM3D and MP3D involve multi-floor buildings, with many requiring explicit floor transitions. However, existing methods are often limited to single-floor settings or predefined object categories. To address these limitations, we tackle two key challenges: (1) efficient cross-level planning and (2) zero-shot object-goal navigation (ZS-OGN), where agents must interpret novel object descriptions without prior exposure. We propose ASCENT, a framework that combines a Multi-Floor Spatial Abstraction module for hierarchical semantic mapping and a Coarse-to-Fine Frontier Reasoning module leveraging Large Language Models (LLMs) for context-aware exploration, without requiring additional training on new object semantics or locomotion data. Our method outperforms state-of-the-art ZS-OGN approaches on HM3D and MP3D benchmarks while enabling efficient multi-floor navigation. We further validate its practicality through real-world deployment on a quadruped robot, achieving successful object exploration across unseen floors.
comment: 34 pages, 12 figures, 10 tables
Structural Abstraction and Selective Refinement for Formal Verification
Safety verification of robot applications is extremely challenging due to the complexity of the environment that a robot typically operates in. Formal verification with model-checking provides guarantees but it may often take too long or even fail for complex models of the environment. A usual solution approach is abstraction, more precisely behavioral abstraction. Our new approach introduces structural abstraction instead, which we investigated in the context of voxel representation of the robot environment. This kind of abstraction leads to abstract voxels. We also propose a complete and automated verification workflow, which is based on an already existing methodology for robot applications, and inspired by the key ideas behind counterexample-guided abstraction refinement (CEGAR) - performing an initial abstraction and successively introducing refinements based on counterexamples, intertwined with model-checker runs. Hence, our approach uses selective refinement of structural abstractions to improve the runtime efficiency of model-checking. A fully-automated implementation of our approach showed its feasibility, since counterexamples have been found for a realistic scenario with a fairly high (maximal) resolution in a few minutes, while direct model-checker runs led to a crash after a couple of days.
comment: 10 pages
Learning coordinated badminton skills for legged manipulators
Coordinating the motion between lower and upper limbs and aligning limb control with perception are substantial challenges in robotics, particularly in dynamic environments. To this end, we introduce an approach for enabling legged mobile manipulators to play badminton, a task that requires precise coordination of perception, locomotion, and arm swinging. We propose a unified reinforcement learning-based control policy for whole-body visuomotor skills involving all degrees of freedom to achieve effective shuttlecock tracking and striking. This policy is informed by a perception noise model that utilizes real-world camera data, allowing for consistent perception error levels between simulation and deployment and encouraging learned active perception behaviors. Our method includes a shuttlecock prediction model, constrained reinforcement learning for robust motion control, and integrated system identification techniques to enhance deployment readiness. Extensive experimental results in a variety of environments validate the robot's capability to predict shuttlecock trajectories, navigate the service area effectively, and execute precise strikes against human players, demonstrating the feasibility of using legged mobile manipulators in complex and dynamic sports scenarios.
comment: Science Robotics DOI: 10.1126/scirobotics.adu3922
A Benchmark Reference for ESP32-CAM Module SP32
The ESP32-CAM is one of the most widely adopted open-source modules for prototyping embedded vision applications. Since its release in 2019, it has gained popularity among both hobbyists and professional developers due to its affordability, versatility, and integrated wireless capabilities. Despite its widespread use, comprehensive documentation of the performance metrics remains limited. This study addresses this gap by collecting and analyzing over six hours of real-time video streaming logs across all supported resolutions of the OV2640 image sensor, tested under five distinct voltage conditions via an HTTP-based WiFi connection. A long standing bug in the official Arduino ESP32 driver, responsible for inaccurate frame rate logging, was fixed. The resulting analysis includes key performance metrics such as instantaneous and average frame rate, total streamed data, transmission count, and internal chip temperature. The influence of varying power levels was evaluated to assess the reliability of the module.
comment: Full work available at GitHub: https://github.com/TNeutron/ESP32-CAM-Performence-Reference-Benchmark
DiffCoTune: Differentiable Co-Tuning for Cross-domain Robot Control
The deployment of robot controllers is hindered by modeling discrepancies due to necessary simplifications for computational tractability or inaccuracies in data-generating simulators. Such discrepancies typically require ad-hoc tuning to meet the desired performance, thereby ensuring successful transfer to a target domain. We propose a framework for automated, gradient-based tuning to enhance performance in the deployment domain by leveraging differentiable simulators. Our method collects rollouts in an iterative manner to co-tune the simulator and controller parameters, enabling systematic transfer within a few trials in the deployment domain. Specifically, we formulate multi-step objectives for tuning and employ alternating optimization to effectively adapt the controller to the deployment domain. The scalability of our framework is demonstrated by co-tuning model-based and learning-based controllers of arbitrary complexity for tasks ranging from low-dimensional cart-pole stabilization to high-dimensional quadruped and biped tracking, showing performance improvements across different deployment domains.
comment: 8 pages, 8 figures
Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits
This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.
Exploiting Euclidean Distance Field Properties for Fast and Safe 3D planning with a modified Lazy Theta*
Graph search planners have been widely used for 3D path planning in the literature, and Euclidean Distance Fields (EDFs) are increasingly being used as a representation of the environment. However, to the best of our knowledge, the integration of EDFs into heuristic planning has been carried out in a loosely coupled fashion, dismissing EDF properties that can be used to accelerate/improve the planning process and enhance the safety margins of the resultant trajectories. This paper presents a fast graph search planner based on a modified Lazy Theta* planning algorithm for aerial robots in challenging 3D environments that exploits the EDF properties. The proposed planner outperforms classic graph search planners in terms of path smoothness and safety. It integrates EDFs as environment representation and directly generates fast and smooth paths avoiding the use of post-processing methods; it also considers the analytical properties of EDFs to obtain an approximation of the EDF cost along the line-of-sight segments and to reduce the number of visibility neighbours, which directly impacts the computation time. Moreover, we demonstrate that the proposed EDF-based cost function satisfies the triangle inequality, which reduces calculations during exploration and, hence, computation time. Many experiments and comparatives are carried out in 3D challenging indoor and outdoor simulation environments to evaluate and validate the proposed planner. The results show an efficient and safe planner in these environments.
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Vision-language-action (VLA) models have shown promise as generalist robotic policies by jointly leveraging visual, linguistic, and proprioceptive modalities to generate action trajectories. While recent benchmarks have advanced VLA research in domestic tasks, professional science-oriented domains remain underexplored. We introduce AutoBio, a simulation framework and benchmark designed to evaluate robotic automation in biology laboratory environments--an application domain that combines structured protocols with demanding precision and multimodal interaction. AutoBio extends existing simulation capabilities through a pipeline for digitizing real-world laboratory instruments, specialized physics plugins for mechanisms ubiquitous in laboratory workflows, and a rendering stack that support dynamic instrument interfaces and transparent materials through physically based rendering. Our benchmark comprises biologically grounded tasks spanning three difficulty levels, enabling standardized evaluation of language-guided robotic manipulation in experimental protocols. We provide infrastructure for demonstration generation and seamless integration with VLA models. Baseline evaluations with two SOTA VLA models reveal significant gaps in precision manipulation, visual reasoning, and instruction following in scientific workflows. By releasing AutoBio, we aim to catalyze research on generalist robotic systems for complex, high-precision, and multimodal professional environments. The simulator and benchmark are publicly available to facilitate reproducible research.
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
Recent advancements in world models have revolutionized dynamic environment simulation, allowing systems to foresee future states and assess potential actions. In autonomous driving, these capabilities help vehicles anticipate the behavior of other road users, perform risk-aware planning, accelerate training in simulation, and adapt to novel scenarios, thereby enhancing safety and reliability. Current approaches exhibit deficiencies in maintaining robust 3D geometric consistency or accumulating artifacts during occlusion handling, both critical for reliable safety assessment in autonomous navigation tasks. To address this, we introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models to enhance spatial understanding and action controllability. Specifically, we first extract a 3D representation from the input frame and then obtain its 2D rendering based on the user-specified ego-car trajectory. To enable dynamic modeling, we propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles. Extensive experiments demonstrate that our method significantly outperforms existing models in both action accuracy and 3D spatial awareness, leading to more realistic, adaptable, and reliable scene modeling for safer autonomous driving. Additionally, our model can generalize to novel trajectories and offers interactive scene editing capabilities, such as object editing and object trajectory control.
comment: code will be released at https://github.com/antonioo-c/GeoDrive
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/
comment: 30 pages, 13 figures, 10 tables
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation
Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines RAG-VLM and Policy-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3D, MP3D, and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot autonomous navigation without requiring prior map building or pre-training.
DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation
We present DexUMI - a data collection and policy learning framework that uses the human hand as the natural interface to transfer dexterous manipulation skills to various robot hands. DexUMI includes hardware and software adaptations to minimize the embodiment gap between the human hand and various robot hands. The hardware adaptation bridges the kinematics gap using a wearable hand exoskeleton. It allows direct haptic feedback in manipulation data collection and adapts human motion to feasible robot hand motion. The software adaptation bridges the visual gap by replacing the human hand in video data with high-fidelity robot hand inpainting. We demonstrate DexUMI's capabilities through comprehensive real-world experiments on two different dexterous robot hand hardware platforms, achieving an average task success rate of 86%.
Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Humans practice slow thinking before performing actual actions when handling complex tasks in the physical world. This thinking paradigm, recently, has achieved remarkable advancement in boosting Large Language Models (LLMs) to solve complex tasks in digital domains. However, the potential of slow thinking remains largely unexplored for robotic foundation models interacting with the physical world. In this work, we propose Hume: a dual-system Vision-Language-Action (VLA) model with value-guided System-2 thinking and cascaded action denoising, exploring human-like thinking capabilities of Vision-Language-Action models for dexterous robot control. System 2 of Hume implements value-Guided thinking by extending a Vision-Language-Action Model backbone with a novel value-query head to estimate the state-action value of predicted actions. The value-guided thinking is conducted by repeat sampling multiple action candidates and selecting one according to state-action value. System 1 of Hume is a lightweight reactive visuomotor policy that takes System 2 selected action and performs cascaded action denoising for dexterous robot control. At deployment time, System 2 performs value-guided thinking at a low frequency while System 1 asynchronously receives the System 2 selected action candidate and predicts fluid actions in real time. We show that Hume outperforms the existing state-of-the-art Vision-Language-Action models across multiple simulation benchmark and real-robot deployments.
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control
Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications -- parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.
comment: Project webpage: https://younggyo.me/fast_td3
Wake-Informed 3D Path Planning for Autonomous Underwater Vehicles Using A* and Neural Network Approximations
Autonomous Underwater Vehicles (AUVs) encounter significant energy, control and navigation challenges in complex underwater environments, particularly during close-proximity operations, such as launch and recovery (LAR), where fluid interactions and wake effects present additional navigational and energy challenges. Traditional path planning methods fail to incorporate these detailed wake structures, resulting in increased energy consumption, reduced control stability, and heightened safety risks. This paper presents a novel wake-informed, 3D path planning approach that fully integrates localized wake effects and global currents into the planning algorithm. Two variants of the A* algorithm - a current-informed planner and a wake-informed planner - are created to assess its validity and two neural network models are then trained to approximate these planners for real-time applications. Both the A* planners and NN models are evaluated using important metrics such as energy expenditure, path length, and encounters with high-velocity and turbulent regions. The results demonstrate a wake-informed A* planner consistently achieves the lowest energy expenditure and minimizes encounters with high-velocity regions, reducing energy consumption by up to 11.3%. The neural network models are observed to offer computational speedup of 6 orders of magnitude, but exhibit 4.51 - 19.79% higher energy expenditures and 9.81 - 24.38% less optimal paths. These findings underscore the importance of incorporating detailed wake structures into traditional path planning algorithms and the benefits of neural network approximations to enhance energy efficiency and operational safety for AUVs in complex 3D domains.
comment: 11 pages, 6 figures, preprint of journal paper
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding CVPR 2025
3D Visual Grounding (3DVG) aims to locate objects in 3D scenes based on textual descriptions, essential for applications like augmented reality and robotics. Traditional 3DVG approaches rely on annotated 3D datasets and predefined object categories, limiting scalability and adaptability. To overcome these limitations, we introduce SeeGround, a zero-shot 3DVG framework leveraging 2D Vision-Language Models (VLMs) trained on large-scale 2D data. SeeGround represents 3D scenes as a hybrid of query-aligned rendered images and spatially enriched text descriptions, bridging the gap between 3D data and 2D-VLMs input formats. We propose two modules: the Perspective Adaptation Module, which dynamically selects viewpoints for query-relevant image rendering, and the Fusion Alignment Module, which integrates 2D images with 3D spatial descriptions to enhance object localization. Extensive experiments on ScanRefer and Nr3D demonstrate that our approach outperforms existing zero-shot methods by large margins. Notably, we exceed weakly supervised methods and rival some fully supervised ones, outperforming previous SOTA by 7.7% on ScanRefer and 7.1% on Nr3D, showcasing its effectiveness in complex 3DVG tasks.
comment: CVPR 2025; 21 pages, 10 figures, 10 tables; Code at https://seeground.github.io/
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in real-world scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/
comment: Website: https://dynamem.github.io
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies
Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event detectors, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event data. To address these limitations, we propose FlexEvent, a novel framework that enables detection at varying frequencies. Our approach consists of two key components: FlexFuse, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FlexTune, a frequency-adaptive fine-tuning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems. Code is publicly available.
comment: Preprint; 27 pages, 14 figures, 10 tables; Code at https://flexevent.github.io/
SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins
Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incurs prohibitive search costs. In this paper, we propose SCoTT, a wireless-aware path planning framework that leverages vision-language models (VLMs) to co-optimize average path gains and trajectory length using wireless heatmap images and ray-tracing data from a digital twin (DT). At the core of our framework is Strategic Chain-of-Thought Tasking (SCoTT), a novel prompting paradigm that decomposes the exhaustive search problem into structured subtasks, each solved via chain-of-thought prompting. To establish strong baselines, we compare classical A* and wireless-aware extensions of it, and derive DP-WA*, an optimal, iterative dynamic programming algorithm that incorporates all path gains and distance metrics from the DT, but at significant computational cost. In extensive experiments, we show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories. Moreover, SCoTT's intermediate outputs can be used to accelerate DP-WA* by reducing its search space, saving up to 62% in execution time. We validate our framework using four VLMs, demonstrating effectiveness across both large and small models, thus making it applicable to a wide range of compact models at low inference cost. We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations. Finally, we discuss data acquisition pipelines, compute requirements, and deployment considerations for VLMs in 6G-enabled DTs, underscoring the potential of natural language interfaces for wireless-aware navigation in real-world applications.
Information Entropy Guided Height-aware Histogram for Quantization-friendly Pillar Feature Encoder
Real-time and high-performance 3D object detection plays a critical role in autonomous driving and robotics. Recent pillar-based 3D object detectors have gained significant attention due to their compact representation and low computational overhead, making them suitable for onboard deployment and quantization. However, existing pillar-based detectors still suffer from information loss along height dimension and large numerical distribution difference during pillar feature encoding (PFE), which severely limits their performance and quantization potential. To address above issue, we first unveil the importance of different input information during PFE and identify the height dimension as a key factor in enhancing 3D detection performance. Motivated by this observation, we propose a height-aware pillar feature encoder, called PillarHist. Specifically, PillarHist statistics the discrete distribution of points at different heights within one pillar with the information entropy guidance. This simple yet effective design greatly preserves the information along the height dimension while significantly reducing the computation overhead of the PFE. Meanwhile, PillarHist also constrains the arithmetic distribution of PFE input to a stable range, making it quantization-friendly. Notably, PillarHist operates exclusively within the PFE stage to enhance performance, enabling seamless integration into existing pillar-based methods without introducing complex operations. Extensive experiments show the effectiveness of PillarHist in terms of both efficiency and performance.
USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner
In the era of Large Language Models (LLMs), embodied artificial intelligence presents transformative opportunities for robotic manipulation tasks. Ultrasound imaging, a widely used and cost-effective medical diagnostic procedure, faces challenges due to the global shortage of professional sonographers. To address this issue, we propose USPilot, an embodied robotic assistant ultrasound system powered by an LLM-based framework to enable autonomous ultrasound acquisition. USPilot is designed to function as a virtual sonographer, capable of responding to patients' ultrasound-related queries and performing ultrasound scans based on user intent. By fine-tuning the LLM, USPilot demonstrates a deep understanding of ultrasound-specific questions and tasks. Furthermore, USPilot incorporates an LLM-enhanced Graph Neural Network (GNN) to manage ultrasound robotic APIs and serve as a task planner. Experimental results show that the LLM-enhanced GNN achieves unprecedented accuracy in task planning on public datasets. Additionally, the system demonstrates significant potential in autonomously understanding and executing ultrasound procedures. These advancements bring us closer to achieving autonomous and potentially unmanned robotic ultrasound systems, addressing critical resource gaps in medical imaging.
Universal Trajectory Optimization Framework for Differential Drive Robot Class
Differential drive robots are widely used in various scenarios thanks to their straightforward principle, from household service robots to disaster response field robots. There are several types of driving mechanisms for real-world applications, including two-wheeled, four-wheeled skid-steering, tracked robots, and so on. The differences in the driving mechanisms usually require specific kinematic modeling when precise control is desired. Furthermore, the nonholonomic dynamics and possible lateral slip lead to different degrees of difficulty in getting feasible and high-quality trajectories. Therefore, a comprehensive trajectory optimization framework to compute trajectories efficiently for various kinds of differential drive robots is highly desirable. In this paper, we propose a universal trajectory optimization framework that can be applied to differential drive robots, enabling the generation of high-quality trajectories within a restricted computational timeframe. We introduce a novel trajectory representation based on polynomial parameterization of motion states or their integrals, such as angular and linear velocities, which inherently matches the robots' motion to the control principle. The trajectory optimization problem is formulated to minimize complexity while prioritizing safety and operational efficiency. We then build a full-stack autonomous planning and control system to demonstrate its feasibility and robustness. We conduct extensive simulations and real-world testing in crowded environments with three kinds of differential drive robots to validate the effectiveness of our approach.
comment: 15 pages, 16 figures
Global Tensor Motion Planning
Batch planning is increasingly necessary to quickly produce diverse and quality motion plans for downstream learning applications, such as distillation and imitation learning. This paper presents Global Tensor Motion Planning (GTMP) -- a sampling-based motion planning algorithm comprising only tensor operations. We introduce a novel discretization structure represented as a random multipartite graph, enabling efficient vectorized sampling, collision checking, and search. We provide a theoretical investigation showing that GTMP exhibits probabilistic completeness while supporting modern GPU/TPU. Additionally, by incorporating smooth structures into the multipartite graph, GTMP directly plans smooth splines without requiring gradient-based optimization. Experiments on lidar-scanned occupancy maps and the MotionBenchMarker dataset demonstrate GTMP's computation efficiency in batch planning compared to baselines, underscoring GTMP's potential as a robust, scalable planner for diverse applications and large-scale robot learning tasks.
comment: 8 pages, 3 figures. Accepted at IEEE Robotics and Automation Letters 2025
Safety Implications of Explainable Artificial Intelligence in End-to-End Autonomous Driving
The end-to-end learning pipeline is gradually creating a paradigm shift in the ongoing development of highly autonomous vehicles (AVs), largely due to advances in deep learning, the availability of large-scale training datasets, and improvements in integrated sensor devices. However, a lack of explainability in real-time decisions with contemporary learning methods impedes user trust and attenuates the widespread deployment and commercialization of such vehicles. Moreover, the issue is exacerbated when these vehicles are involved in or cause traffic accidents. Consequently, explainability in end-to-end autonomous driving is essential to build trust in vehicular automation. With that said, automotive researchers have not yet rigorously explored safety benefits and consequences of explanations in end-to-end autonomous driving. This paper aims to bridge the gaps between these topics and seeks to answer the following research question: What are safety implications of explanations in end-to-end autonomous driving? In this regard, we first revisit established safety and explainability concepts in end-to-end driving. Furthermore, we present critical case studies and show the pivotal role of explanations in enhancing driving safety. Finally, we describe insights from empirical studies and reveal potential value, limitations, and caveats of practical explainable AI methods with respect to their potential impacts on safety of end-to-end driving.
comment: Accepted for publication in IEEE Transactions on Intelligent Transportation Systems
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes
We propose M3Bench, a new benchmark for whole-body motion generation in mobile manipulation tasks. Given a 3D scene context, M3Bench requires an embodied agent to reason about its configuration, environmental constraints, and task objectives to generate coordinated whole-body motion trajectories for object rearrangement. M3Bench features 30,000 object rearrangement tasks across 119 diverse scenes, providing expert demonstrations generated by our newly developed M3BenchMaker, an automatic data generation tool that produces whole-body motion trajectories from high-level task instructions using only basic scene and robot information. Our benchmark includes various task splits to evaluate generalization across different dimensions and leverages realistic physics simulation for trajectory assessment. Extensive evaluation analysis reveals that state-of-the-art models struggle with coordinating base-arm motion while adhering to environmental and task-specific constraints, underscoring the need for new models to bridge this gap. By releasing M3Bench and M3BenchMaker we aim to advance robotics research toward more adaptive and capable mobile manipulation in diverse, real-world environments.
comment: This paper has been accepted by IEEE Robotics and Automation Letters 2025 (RA-L)
Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network ICML 2025
Reinforcement learning (RL) for continuous control often requires large amounts of online interaction data. Value-based RL methods can mitigate this burden by offering relatively high sample efficiency. Some studies further enhance sample efficiency by incorporating offline demonstration data to "kick-start" training, achieving promising results in continuous control. However, they typically compute the Q-function independently for each action dimension, neglecting interdependencies and making it harder to identify optimal actions when learning from suboptimal data, such as non-expert demonstration and online-collected data during the training process. To address these issues, we propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner. First, ARSQ decomposes the continuous action space into discrete spaces in a coarse-to-fine hierarchy, enhancing sample efficiency for fine-grained continuous control tasks. Next, it auto-regressively predicts dimensional action advantages within each decision step, enabling more effective decision-making in continuous control tasks. We evaluate ARSQ on two continuous control benchmarks, RLBench and D4RL, integrating demonstration data into online training. On D4RL, which includes non-expert demonstrations, ARSQ achieves an average $1.62\times$ performance improvement over SOTA value-based baseline. On RLBench, which incorporates expert demonstrations, ARSQ surpasses various baselines, demonstrating its effectiveness in learning from suboptimal online-collected data. Project page is at https://sites.google.com/view/ar-soft-q
comment: Accepted by ICML 2025
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge
Vision-language-action (VLA) models have emerged as the next generation of models in robotics. However, despite leveraging powerful pre-trained Vision-Language Models (VLMs), existing end-to-end VLA systems often lose key capabilities during fine-tuning as the model adapts to specific robotic tasks. We argue that a generalizable VLA model should retain and expand upon the VLM's core competencies: 1) Open-world embodied reasoning - the VLA should inherit the knowledge from VLM, i.e., recognize anything that the VLM can recognize, be capable of solving math problems, and possess visual-spatial intelligence, 2) Reasoning following - effectively translating the open-world reasoning into actionable steps for the robot. In this work, we introduce ChatVLA-2, a novel mixture-of-expert VLA model coupled with a specialized two-stage training pipeline designed to preserve the VLM's original strengths while enabling actionable reasoning. To validate our approach, we design a math-matching task wherein a robot interprets math problems written on a whiteboard and picks corresponding number cards from a table to solve equations. Remarkably, our method exhibits exceptional mathematical reasoning and OCR capabilities, despite these abilities not being explicitly trained within the VLA. Furthermore, we demonstrate that the VLA possesses strong spatial reasoning skills, enabling it to interpret novel directional instructions involving previously unseen objects. Overall, our method showcases reasoning and comprehension abilities that significantly surpass state-of-the-art imitation learning methods such as OpenVLA, DexVLA, and pi-zero. This work represents a substantial advancement toward developing truly generalizable robotic foundation models endowed with robust reasoning capacities.
comment: Project page: https://chatvla-2.github.io/
CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems
Multi-modal learning has become a crucial technique for improving the performance of machine learning applications across domains such as autonomous driving, robotics, and perception systems. However, in certain scenarios, particularly in resource-constrained environments, some modalities available during training may be absent during inference. While existing frameworks effectively utilize multiple data sources during training and enable inference with reduced modalities, they are primarily designed for single-agent settings. This poses a critical limitation in dynamic environments such as connected autonomous vehicles (CAV), where incomplete data coverage can lead to decision-making blind spots. Conversely, some works explore multi-agent collaboration but without addressing missing modality at test time. To overcome these limitations, we propose Collaborative Auxiliary Modality Learning (CAML), a novel multi-modal multi-agent framework that enables agents to collaborate and share multi-modal data during training, while allowing inference with reduced modalities during testing. Experimental results in collaborative decision-making for CAV in accident-prone scenarios demonstrate that CAML achieves up to a ${\bf 58.1}\%$ improvement in accident detection. Additionally, we validate CAML on real-world aerial-ground robot data for collaborative semantic segmentation, achieving up to a ${\bf 10.6}\%$ improvement in mIoU.
Foundation Models for Rapid Autonomy Validation
We are motivated by the problem of autonomous vehicle performance validation. A key challenge is that an autonomous vehicle requires testing in every kind of driving scenario it could encounter, including rare events, to provide a strong case for safety and show there is no edge-case pathological behavior. Autonomous vehicle companies rely on potentially millions of miles driven in realistic simulation to expose the driving stack to enough miles to estimate rates and severity of collisions. To address scalability and coverage, we propose the use of a behavior foundation model, specifically a masked autoencoder (MAE), trained to reconstruct driving scenarios. We leverage the foundation model in two complementary ways: we (i) use the learned embedding space to group qualitatively similar scenarios together and (ii) fine-tune the model to label scenario difficulty based on the likelihood of a collision upon simulation. We use the difficulty scoring as importance weighting for the groups of scenarios. The result is an approach which can more rapidly estimate the rates and severity of collisions by prioritizing hard scenarios while ensuring exposure to every kind of driving scenario.
Neuro-Symbolic Generation of Explanations for Robot Policies with Weighted Signal Temporal Logic
Neural network-based policies have demonstrated success in many robotic applications, but often lack human-explanability, which poses challenges in safety-critical deployments. To address this, we propose a neuro-symbolic explanation framework that generates a weighted signal temporal logic (wSTL) specification to describe a robot policy in a interpretable form. Existing methods typically produce explanations that are verbose and inconsistent, which hinders explainability, and loose, which do not give meaningful insights into the underlying policy. We address these issues by introducing a simplification process consisting of predicate filtering, regularization, and iterative pruning. We also introduce three novel explainability evaluation metrics -- conciseness, consistency, and strictness -- to assess explanation quality beyond conventional classification metrics. Our method is validated in three simulated robotic environments, where it outperforms baselines in generating concise, consistent, and strict wSTL explanations without sacrificing classification accuracy. This work bridges policy learning with formal methods, contributing to safer and more transparent decision-making in robotics.
comment: This work has been submitted to the IEEE for possible publication
Large-Scale Multi-Robot Coverage Path Planning on Grids with Path Deconfliction
We study Multi-Robot Coverage Path Planning (MCPP) on a 4-neighbor 2D grid G, which aims to compute paths for multiple robots to cover all cells of G. Traditional approaches are limited as they first compute coverage trees on a quadrant coarsened grid H and then employ the Spanning Tree Coverage (STC) paradigm to generate paths on G, making them inapplicable to grids with partially obstructed 2x2 blocks. To address this limitation, we reformulate the problem directly on G, revolutionizing grid-based MCPP solving and establishing new NP-hardness results. We introduce Extended-STC (ESTC), a novel paradigm that extends STC to ensure complete coverage with bounded suboptimality, even when H includes partially obstructed blocks. Furthermore, we present LS-MCPP, a new algorithmic framework that integrates ESTC with three novel types of neighborhood operators within a local search strategy to optimize coverage paths directly on G. Unlike prior grid-based MCPP work, our approach also incorporates a versatile post-processing procedure that applies Multi-Agent Path Finding (MAPF) techniques to MCPP for the first time, enabling a fusion of these two important fields in multi-robot coordination. This procedure effectively resolves inter-robot conflicts and accommodates turning costs by solving a MAPF variant, making our MCPP solutions more practical for real-world applications. Extensive experiments demonstrate that our approach significantly improves solution quality and efficiency, managing up to 100 robots on grids as large as 256x256 within minutes of runtime. Validation with physical robots confirms the feasibility of our solutions under real-world conditions.
comment: accepted to T-RO
Dexterous Control of an 11-DOF Redundant Robot for CT-Guided Needle Insertion With Task-Oriented Weighted Policies
Computed tomography (CT)-guided needle biopsies are critical for diagnosing a range of conditions, including lung cancer, but present challenges such as limited in-bore space, prolonged procedure times, and radiation exposure. Robotic assistance offers a promising solution by improving needle trajectory accuracy, reducing radiation exposure, and enabling real-time adjustments. In our previous work, we introduced a redundant robotic platform designed for dexterous needle insertion within the confined CT bore. However, its limited base mobility restricts flexible deployment in clinical settings. In this study, we present an improved 11-degree-of-freedom (DOF) robotic system that integrates a 6-DOF robotic base with a 5-DOF cable-driven end-effector, significantly enhancing workspace flexibility and precision. With the hyper-redundant degrees of freedom, we introduce a weighted inverse kinematics controller with a two-stage priority scheme for large-scale movement and fine in-bore adjustments, along with a null-space control strategy to optimize dexterity. We validate our system through both simulation and real-world experiments, demonstrating superior tracking accuracy and enhanced manipulability in CT-guided procedures. The study provides a strong case for hyper-redundancy and null-space control formulations for robot-assisted needle biopsy scenarios.
Multiagent Systems
ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork
Developing AI agents capable of collaborating with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches typically adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammate pools with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the distribution of training teammates. In this paper, we present a unified framework for AHT by reformulating the problem as an open-ended learning process between an ad hoc agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Extensive experiments across diverse AHT environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.
Collaborative Last-Mile Delivery: A Multi-Platform Vehicle Routing Problem With En-route Charging
The rapid growth of e-commerce and the increasing demand for timely, cost-effective last-mile delivery have increased interest in collaborative logistics. This research introduces a novel collaborative synchronized multi-platform vehicle routing problem with drones and robots (VRP-DR), where a fleet of $\mathcal{M}$ trucks, $\mathcal{N}$ drones and $\mathcal{K}$ robots, cooperatively delivers parcels. Trucks serve as mobile platforms, enabling the launching, retrieving, and en-route charging of drones and robots, thereby addressing critical limitations such as restricted payload capacities, limited range, and battery constraints. The VRP-DR incorporates five realistic features: (1) multi-visit service per trip, (2) multi-trip operations, (3) flexible docking, allowing returns to the same or different trucks (4) cyclic and acyclic operations, enabling return to the same or different nodes; and (5) en-route charging, enabling drones and robots to recharge while being transported on the truck, maximizing operational efficiency by utilizing idle transit time. The VRP-DR is formulated as a mixed-integer linear program (MILP) to minimize both operational costs and makespan. To overcome the computational challenges of solving large-scale instances, a scalable heuristic algorithm, FINDER (Flexible INtegrated Delivery with Energy Recharge), is developed, to provide efficient, near-optimal solutions. Numerical experiments across various instance sizes evaluate the performance of the MILP and heuristic approaches in terms of solution quality and computation time. The results demonstrate significant time savings of the combined delivery mode over the truck-only mode and substantial cost reductions from enabling multi-visits. The study also provides insights into the effects of en-route charging, docking flexibility, drone count, speed, and payload capacity on system performance.
Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems
The communication topology in large language model-based multi-agent systems fundamentally governs inter-agent collaboration patterns, critically shaping both the efficiency and effectiveness of collective decision-making. While recent studies for communication topology automated design tend to construct sparse structures for efficiency, they often overlook why and when sparse and dense topologies help or hinder collaboration. In this paper, we present a causal framework to analyze how agent outputs, whether correct or erroneous, propagate under topologies with varying sparsity. Our empirical studies reveal that moderately sparse topologies, which effectively suppress error propagation while preserving beneficial information diffusion, typically achieve optimal task performance. Guided by this insight, we propose a novel topology design approach, EIB-leanrner, that balances error suppression and beneficial information propagation by fusing connectivity patterns from both dense and sparse graphs. Extensive experiments show the superior effectiveness, communication cost, and robustness of EIB-leanrner.
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration
Large Language Model-based multi-agent systems (MAS) have shown remarkable progress in solving complex tasks through collaborative reasoning and inter-agent critique. However, existing approaches typically treat each task in isolation, resulting in redundant computations and limited generalization across structurally similar tasks. To address this, we introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation. We model the task-solving workflow on a graph-structured multi-agent collaboration network, where agents propagate information and coordinate via explicit connectivity. During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards along with the corresponding inputs and outputs into each agent's individual experience pool. During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step, thereby enabling more accurate and efficient multi-agent collaboration. Experimental results on diverse datasets demonstrate that MAEL empowers agents to learn from prior task experiences effectively-achieving faster convergence and producing higher-quality solutions on current tasks.
comment: Work in Progress
Learning Recommender Mechanisms for Bayesian Stochastic Games
An important challenge in non-cooperative game theory is coordinating on a single (approximate) equilibrium from many possibilities - a challenge that becomes even more complex when players hold private information. Recommender mechanisms tackle this problem by recommending strategies to players based on their reported type profiles. A key consideration in such mechanisms is to ensure that players are incentivized to participate, report their private information truthfully, and follow the recommendations. While previous work has focused on designing recommender mechanisms for one-shot and extensive-form games, these approaches cannot be effectively applied to stochastic games, particularly if we constrain recommendations to be Markov stationary policies. To bridge this gap, we introduce a novel bi-level reinforcement learning approach for automatically designing recommender mechanisms in Bayesian stochastic games. Our method produces a mechanism represented by a parametric function (such as a neural network), and is therefore highly efficient at execution time. Experimental results on two repeated and two stochastic games demonstrate that our approach achieves social welfare levels competitive with cooperative multi-agent reinforcement learning baselines, while also providing significantly improved incentive properties.
MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming
Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using Mermaid, a structured and human-interpretable graph language. We formulate domain-aware evolutionary operators, i.e., crossover, mutation, insertion, and deletion, to preserve semantic correctness while promoting structural diversity, enabling efficient exploration of a high-quality, statically verifiable workflow space. Without modifying task settings or evaluation protocols, MermaidFlow achieves consistent improvements in success rates and faster convergence to executable plans on the agent reasoning benchmark. The experimental results demonstrate that safety-constrained graph evolution offers a scalable, modular foundation for robust and interpretable agentic reasoning systems.
Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve
Recent studies show that LLMs possess different skills and specialize in different tasks. In fact, we observe that their varied performance occur in several levels of granularity. For example, in the code optimization task, code LLMs excel at different optimization categories and no one dominates others. This observation prompts the question of how one leverages multiple LLM agents to solve a coding problem without knowing their complementary strengths a priori. We argue that a team of agents can learn from each other's successes and failures so as to improve their own performance. Thus, a lesson is the knowledge produced by an agent and passed on to other agents in the collective solution process. We propose a lesson-based collaboration framework, design the lesson solicitation--banking--selection mechanism, and demonstrate that a team of small LLMs with lessons learned can outperform a much larger LLM and other multi-LLM collaboration methods.
Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease
Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset. Materials and Methods: We used the "Quick Access" dataset of the National Alzheimer's Coordinating Center (NACC). We identified highly cited published research manuscripts using NACC data and selected five studies that appeared reproducible using this dataset alone. Using GPT-4o, we created a simulated research team of LLM-based autonomous agents tasked with writing and executing code to dynamically reproduce the findings of each study, given only study Abstracts, Methods sections, and data dictionary descriptions of the dataset. Results: We extracted 35 key findings described in the Abstracts across 5 Alzheimer's studies. On average, LLM agents approximately reproduced 53.2% of findings per study. Numeric values and range-based findings often differed between studies and agents. The agents also applied statistical methods or parameters that varied from the originals, though overall trends and significance were sometimes similar. Discussion: In some cases, LLM-based agents replicated research techniques and findings. In others, they failed due to implementation flaws or missing methodological detail. These discrepancies show the current limits of LLMs in fully automating reproducibility assessments. Still, this early investigation highlights the potential of structured agent-based systems to provide scalable evaluation of scientific rigor. Conclusion: This exploratory work illustrates both the promise and limitations of LLMs as autonomous agents for automating reproducibility in biomedical research.
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets
AI agents are increasingly used in consumer-facing applications to assist with tasks such as product search, negotiation, and transaction execution. In this paper, we explore a future scenario where both consumers and merchants authorize AI agents to fully automate negotiations and transactions. We aim to answer two key questions: (1) Do different LLM agents vary in their ability to secure favorable deals for users? (2) What risks arise from fully automating deal-making with AI agents in consumer markets? To address these questions, we develop an experimental framework that evaluates the performance of various LLM agents in real-world negotiation and transaction settings. Our findings reveal that AI-mediated deal-making is an inherently imbalanced game -- different agents achieve significantly different outcomes for their users. Moreover, behavioral anomalies in LLMs can result in financial losses for both consumers and merchants, such as overspending or accepting unreasonable deals. These results underscore that while automation can improve efficiency, it also introduces substantial risks. Users should exercise caution when delegating business decisions to AI agents.
Literature Review Of Multi-Agent Debate For Problem-Solving
Multi-agent large language models (MA-LLMs) are a rapidly growing research area that leverages multiple interacting language agents to tackle complex tasks, outperforming single-agent large language models. This literature review synthesizes the latest research on agent profiles, communication structures, and decision-making processes, drawing insights from both traditional multi-agent systems and state-of-the-art MA-LLM studies. In doing so, it aims to address the lack of direct comparisons in the field, illustrating how factors like scalability, communication structure, and decision-making processes influence MA-LLM performance. By examining frequent practices and outlining current challenges, the review reveals that multi-agent approaches can yield superior results but also face elevated computational costs and under-explored challenges unique to MA-LLM. Overall, these findings provide researchers and practitioners with a roadmap for developing robust and efficient multi-agent AI solutions.
comment: 11 pages, 2 figures
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence. Nevertheless, the question of how agents should be structurally organized for optimal cooperation remains largely unexplored. In this position paper, we aim to gently redirect the focus of the MAS research community toward this critical dimension: develop topology-aware MASs for specific tasks. Specifically, the system consists of three core components - agents, communication links, and communication patterns - that collectively shape its coordination performance and efficiency. To this end, we introduce a systematic, three-stage framework: agent selection, structure profiling, and topology synthesis. Each stage would trigger new research opportunities in areas such as language models, reinforcement learning, graph learning, and generative modeling; together, they could unleash the full potential of MASs in complicated real-world applications. Then, we discuss the potential challenges and opportunities in the evaluation of multiple systems. We hope our perspective and framework can offer critical new insights in the era of agentic AI.
The challenge of hidden gifts in multi-agent reinforcement learning
Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These "hidden gifts" represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a "hidden gift". We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of "hidden gifts", and demonstrate that learning awareness in independent agents can benefit these settings.
MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs
LLM-based multi-agent systems (MAS) have shown promise in tackling complex tasks. However, existing solutions often suffer from limited agent coordination and heavy reliance on predefined Standard Operating Procedures (SOPs), which demand extensive human input. To address these limitations, we propose MegaAgent, a large-scale autonomous LLM-based multi-agent system. MegaAgent generates agents based on task complexity and enables dynamic task decomposition, parallel execution, efficient communication, and comprehensive system monitoring of agents. In evaluations, MegaAgent demonstrates exceptional performance, successfully developing a Gobang game within 800 seconds and scaling up to 590 agents in a national policy simulation to generate multi-domain policies. It significantly outperforms existing systems, such as MetaGPT, in both task completion efficiency and scalability. By eliminating the need for predefined SOPs, MegaAgent demonstrates exceptional scalability and autonomy, setting a foundation for advancing true autonomy in MAS. Our code is available at https://github.com/Xtra-Computing/MegaAgent .
Agentic Knowledgeable Self-awareness ACL 2025
Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a "flood irrigation" methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of situational self-awareness during decision-making-the ability to dynamically assess situational demands and strategically employ resources during decision-making. We propose agentic knowledgeable self-awareness to address this gap, a novel paradigm enabling LLM-based agents to autonomously regulate knowledge utilization. Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. Concretely, we devise a heuristic situation judgement criterion to mark special tokens on the agent's self-explored trajectories for collecting training data. Through a two-stage training process, the agent model can switch between different situations by generating specific special tokens, achieving optimal planning effects with minimal costs. Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge. Code is available at https://github.com/zjunlp/KnowSelf.
comment: ACL 2025
Emergent social conventions and collective bias in LLM populations
Social conventions are the backbone of social coordination, shaping how individuals form a group. As growing populations of artificial intelligence (AI) agents communicate through natural language, a fundamental question is whether they can bootstrap the foundations of a society. Here, we present experimental results that demonstrate the spontaneous emergence of universally adopted social conventions in decentralized populations of large language model (LLM) agents. We then show how strong collective biases can emerge during this process, even when agents exhibit no bias individually. Last, we examine how committed minority groups of adversarial LLM agents can drive social change by imposing alternative social conventions on the larger population. Our results show that AI systems can autonomously develop social conventions without explicit programming and have implications for designing AI systems that align, and remain aligned, with human values and societal goals.
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game ICML 2024
Agents built with large language models (LLMs) have shown great potential across a wide range of domains. However, in complex decision-making tasks, pure LLM-based agents tend to exhibit intrinsic bias in their choice of actions, which is inherited from the model's training data and results in suboptimal performance. To develop strategic language agents, i.e., agents that generate flexible language actions and possess strong decision-making abilities, we propose a novel framework that powers LLM-based agents with reinforcement learning (RL). We consider Werewolf, a popular social deduction game, as a challenging testbed that emphasizes versatile communication and strategic gameplay. To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates. Then an RL policy trained to optimize the decision-making ability chooses an action from the candidates to play in the game. Extensive experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game. We also conduct human-agent experiments and find that our agents achieve human-level performance and demonstrate strong strategic play.
comment: Published in ICML 2024
MedRAX: Medical Reasoning Agent for Chest X-ray
Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. Data and code have been publicly available at https://github.com/bowang-lab/MedRAX
comment: 16 pages, 4 figures, 5 Tables
AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems
The rapid advancement of large language models (LLMs) has enabled the development of multi-agent systems where multiple LLM-based agents collaborate on complex tasks. However, existing systems often rely on centralized coordination, leading to scalability bottlenecks, reduced adaptability, and single points of failure. Privacy and proprietary knowledge concerns further hinder cross-organizational collaboration, resulting in siloed expertise. We propose AgentNet, a decentralized, Retrieval-Augmented Generation (RAG)-based framework that enables LLM-based agents to specialize, evolve, and collaborate autonomously in a dynamically structured Directed Acyclic Graph (DAG). Unlike prior approaches with static roles or centralized control, AgentNet allows agents to adjust connectivity and route tasks based on local expertise and context. AgentNet introduces three key innovations: (1) a fully decentralized coordination mechanism that eliminates the need for a central orchestrator, enhancing robustness and emergent intelligence; (2) dynamic agent graph topology that adapts in real time to task demands, ensuring scalability and resilience; and (3) a retrieval-based memory system for agents that supports continual skill refinement and specialization. By minimizing centralized control and data exchange, AgentNet enables fault-tolerant, privacy-preserving collaboration across organizations. Experiments show that AgentNet achieves higher task accuracy than both single-agent and centralized multi-agent baselines.
LLM Agents Making Agent Tools ACL 2025
Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers, hindering the applicability of LLM agents in domains demanding large numbers of highly specialised tools, like in life sciences and medicine. Motivated by the growing trend of scientific studies accompanied by public code repositories, we propose ToolMaker, an agentic framework that autonomously transforms papers with code into LLM-compatible tools. Given a GitHub URL and short task description, ToolMaker autonomously installs dependencies and generates code to perform the task, using a closed-loop self-correction mechanism for debugging. To evaluate our approach, we introduce a benchmark comprising 15 complex computational tasks spanning various domains with over 100 unit tests to assess correctness and robustness. Our method correctly implements 80% of the tasks, substantially outperforming current state-of-the-art software engineering agents. ToolMaker therefore is a step towards fully autonomous agent-based scientific workflows. Our code and benchmark are publicly available at https://github.com/KatherLab/ToolMaker.
comment: Accepted at ACL 2025
Systems and Control (CS)
From Connectivity to Autonomy: The Dawn of Self-Evolving Communication Systems
This paper envisions 6G as a self-evolving telecom ecosystem, where AI-driven intelligence enables dynamic adaptation beyond static connectivity. We explore the key enablers of autonomous communication systems, spanning reconfigurable infrastructure, adaptive middleware, and intelligent network functions, alongside multi-agent collaboration for distributed decision-making. We explore how these methodologies align with emerging industrial IoT frameworks, ensuring seamless integration within digital manufacturing processes. Our findings emphasize the potential for improved real-time decision-making, optimizing efficiency, and reducing latency in networked control systems. The discussion addresses ethical challenges, research directions, and standardization efforts, concluding with a technology stack roadmap to guide future developments. By leveraging state-of-the-art 6G network management techniques, this research contributes to the next generation of intelligent automation solutions, bridging the gap between theoretical advancements and real-world industrial applications.
Decoupling Periodic Systems: An Algebraic Approach
This paper addresses the problem of row-by-row (or diagonal) decoupling of discrete-time linear multi-input multi-output systems with periodic time-varying coefficients using periodic state feedback. Previous solutions have tackled row-by-row decoupling using dynamic compensation for square systems and block-decoupling through regular state feedback for nonsquare systems with more outputs than inputs. While it appears likely that a row-by-row state feedback solution for square systems can be deduced from these findings, a direct argument seems more appropriate here as it presents a natural extension for decoupling nonsquare systems with more inputs than outputs. This extension, which necessitates nonregular state feedback, has yet to be explored for periodic systems. Our approach is purely algebraic, based on a time-invariant representation of the periodic system.
comment: 8 pages, submitted February 5, 2025, to IEEE Transactions on Automatic Control
Integrated design of system structure and delayed resonator towards efficient non-collocated vibration absorption
The problem of non-collocated vibration absorption by a delayed resonator is addressed with emphasis on system fatigue resistance and energy efficiency of control actions. The analysis is performed for a system consisting of an arbitrary large series of flexibly linked single-degree-of-freedom masses. For the stage where the vibration of the target mass is fully absorbed by the non-collocated resonator, key forces, motion amplitudes and potential energies across the system structure are assessed. Next, a complete parameter set of the resonator gain and delay is derived, and the actuation force and power needed by the resonator for the full vibration absorption is determined. The derived quantities are utilized in forming an optimization problem to balance minimal risk of fatigue across the system structure and power needed by the resonator, under the closed loop stability and parameter constraints. Next to the gain and delay of the resonator, selected structural parameters of the system are used as variables in the constrained nonlinear optimization problem. Experimental and numerical case studies are included to demonstrate benefits of the proposed integrated structural and control design.
comment: 41 pages, 8 figures, 4 tables, Accepted 4 April 2025, Available online 18 April 2025, Version of Record 6 May 2025
Robust Aperiodic Sampled-Data Washout Control for Uncertain Affine Systems
In this paper, we address the problem of designing an aperiodic sampled-data controller stabilizing the zero-input equilibrium of an uncertain affine plant. The closed-loop system is modeled as a hybrid dynamical system incorporating a timer triggering the occurrence of the sampling events and two memory states storing the value of the controller state and controller output at each sampling time. Necessary and sufficient conditions on the controller parameters are given to establish the sought property. A constructive controller design algorithm based on sum-of-squares programming is given. A numerical example illustrates the effectiveness of the approach.
comment: extended version of the paper presented at IFAC ROCOND 2025
Long Duration Inspection of GNSS-Denied Environments with a Tethered UAV-UGV Marsupial System
Unmanned Aerial Vehicles (UAVs) have become essential tools in inspection and emergency response operations due to their high maneuverability and ability to access hard-to-reach areas. However, their limited battery life significantly restricts their use in long-duration missions. This paper presents a novel tethered marsupial robotic system composed of a UAV and an Unmanned Ground Vehicle (UGV), specifically designed for autonomous, long-duration inspection tasks in Global Navigation Satellite System (GNSS)-denied environments. The system extends the UAV's operational time by supplying power through a tether connected to high-capacity battery packs carried by the UGV. We detail the hardware architecture based on off-the-shelf components to ensure replicability and describe our full-stack software framework, which is composed of open-source components and built upon the Robot Operating System (ROS). The proposed software architecture enables precise localization using a Direct LiDAR Localization (DLL) method and ensures safe path planning and coordinated trajectory tracking for the integrated UGV-tether-UAV system. We validate the system through three field experiments: (1) a manual flight endurance test to estimate the operational duration, (2) an autonomous navigation test, and (3) an inspection mission to demonstrate autonomous inspection capabilities. Experimental results confirm the robustness and autonomy of the system, its capacity to operate in GNSS-denied environments, and its potential for long-endurance, autonomous inspection and monitoring tasks.
comment: 30 pages, 15 figures, 3 tables, 1 algorithm. Submitted to Journal of Intelligent & Robotic Systems
Evaluation of Voltage Unbalance Metrics in Distribution Networks with High DER Penetration
Voltage unbalance, caused by variations in voltage magnitude and phase angle, is a significant power quality issue in three-phase systems, leading to equipment inefficiencies and increased system losses. The integration of distributed energy resources (DER) into the grid adds complexity, as DER can either reduce or worsen voltage unbalance, depending on factors such as grid configuration and the distribution of loads and DER themselves. This study explores the effects of DER penetration on voltage unbalance levels and the accuracy of the different indices most commonly used to quantify this unbalance. The results highlight the varying impacts of DER on unbalance and index performance, emphasizing the need for effective strategies to assess voltage unbalance in modern distribution systems.
CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection
Detection Transformers (DETR) are increasingly adopted in autonomous vehicle (AV) perception systems due to their superior accuracy over convolutional networks. However, concurrently executing multiple DETR tasks presents significant challenges in meeting firm real-time deadlines (R1) and high accuracy requirements (R2), particularly for safety-critical objects, while navigating the inherent latency-accuracy trade-off under resource constraints. Existing real-time DNN scheduling approaches often treat models generically, failing to leverage Transformer-specific properties for efficient resource allocation. To address these challenges, we propose CF-DETR, an integrated system featuring a novel coarse-to-fine Transformer architecture and a dedicated real-time scheduling framework NPFP**. CF-DETR employs three key strategies (A1: coarse-to-fine inference, A2: selective fine inference, A3: multi-level batch inference) that exploit Transformer properties to dynamically adjust patch granularity and attention scope based on object criticality, aiming to satisfy R2. The NPFP** scheduling framework (A4) orchestrates these adaptive mechanisms A1-A3. It partitions each DETR task into a safety-critical coarse subtask for guaranteed critical object detection within its deadline (ensuring R1), and an optional fine subtask for enhanced overall accuracy (R2), while managing individual and batched execution. Our extensive evaluations on server, GPU-enabled embedded platforms, and actual AV platforms demonstrate that CF-DETR, under an NPFP** policy, successfully meets strict timing guarantees for critical operations and achieves significantly higher overall and critical object detection accuracy compared to existing baselines across diverse AV workloads.
comment: 12 pages
Optimizing Connectivity and Scheduling of Near/Far Field Users in Massive MIMO NOMA System
It is envisioned that the next generations of wireless communication environment will be characterized with dense traffic demand due to the prediction that there will be large numbers of active users. Hence, it is important to find a solution to deal with such dense numbers of users. This paper investigates optimizing the connectivity and users scheduling to improve the performance of near and far field users in a downlink, multiuser, massive MIMO-NOMA system. For the considered system model, combining NOMA side by side with massive MIMO offers a great opportunity to exploit the available radio resources and boost the overall system efficiency. The paper proposes separate clustering of near field users and far field users. It also proposes using a beamforming scheme to separately serve the users within each cluster. However, NOMA is proposed to be applied among all users to boost resource sharing. In particular, a cognitive-NOMA beamforming scheme and NOMA themed beamforming are proposed to serve the users within each cluster, and they are compared against random beamforming from literature. Simulation results show that both of the proposed beamforming schemes proved their superiority as compared to random beamforming. Several scheduling techniques were also considered in this paper to examine possible solutions for boosting the system performance considered, namely, priority, joint, dynamic, and fairness-based scheduling techniques for both near field and far field users. The paper also proposes a suboptimal, fairness aiming and gradual allocation approach for allocating the transmission power among the users. The results show that user-clustering offers better connectivity and scheduling performance than the case where no clustering is applied.
Latent Representations for Control Design with Provable Stability and Safety Guarantees
We initiate a formal study on the use of low-dimensional latent representations of dynamical systems for verifiable control synthesis. Our main goal is to enable the application of verification techniques -- such as Lyapunov or barrier functions -- that might otherwise be computationally prohibitive when applied directly to the full state representation. Towards this goal, we first provide dynamics-aware approximate conjugacy conditions which formalize the notion of reconstruction error necessary for systems analysis. We then utilize our conjugacy conditions to transfer the stability and invariance guarantees of a latent certificate function (e.g., a Lyapunov or barrier function) for a latent space controller back to the original system. Importantly, our analysis contains several important implications for learning latent spaces and dynamics, by highlighting the necessary geometric properties which need to be preserved by the latent space, in addition to providing concrete loss functions for dynamics reconstruction that are directly related to control design. We conclude by demonstrating the applicability of our theory to two case studies: (1) stabilization of a cartpole system, and (2) collision avoidance for a two vehicle system.
System Identification for Virtual Sensor-Based Model Predictive Control: Application to a 2-DoF Direct-Drive Robotic Arm
Nonlinear Model Predictive Control (NMPC) offers a powerful approach for controlling complex nonlinear systems, yet faces two key challenges. First, accurately modeling nonlinear dynamics remains difficult. Second, variables directly related to control objectives often cannot be directly measured during operation. Although high-cost sensors can acquire these variables during model development, their use in practical deployment is typically infeasible. To overcome these limitations, we propose a Predictive Virtual Sensor Identification (PVSID) framework that leverages temporary high-cost sensors during the modeling phase to create virtual sensors for NMPC implementation. We validate PVSID on a Two-Degree-of-Freedom (2-DoF) direct-drive robotic arm with complex joint interactions, capturing tip position via motion capture during modeling and utilize an Inertial Measurement Unit (IMU) in NMPC. Experimental results show our NMPC with identified virtual sensors achieves precise tip trajectory tracking without requiring the motion capture system during operation. PVSID offers a practical solution for implementing optimal control in nonlinear systems where the measurement of key variables is constrained by cost or operational limitations.
comment: 6 pages, 5 figures, submitted to L-CSS with CDC 2025 option
Interturn Fault Detection in IPMSMs: Two Adaptive Observer-based Solutions
In this paper we address the problem of online detection of inter-turn short-circuit faults (ITSCFs) that occur in permanent magnet synchronous motors (PMSMs). We propose two solutions to this problem: (i) a very simple linear observer and (ii) a generalized parameter estimation based observer, that incorporates a high performance estimator -- with both observers detecting the short-circuit current and the fault intensity. Although the first solution guarantees the detection of the fault exponentially fast, the rate of convergence is fully determined by the motor parameters that, in some cases, may be too slow. The second observer, on the other hand, ensures finite convergence time under the weakest assumption of interval excitation. To make the observers adaptive, we develop a parameter estimator that, in the case of isotropic PMSMs, estimates on-line (exponentially fast) the resistance and inductance of the motor. It should be underscored that, in contrast with existing observers (including the widely popular Kalman filter) that provide indirect information of the fault current, our observers provide explicit one -- namely the amplitude of the fault current. The performance of both observers, in their linear and generalized parameter estimation-based versions, is illustrated with realistic simulation studies.
Voltage Control of the Boost Converter: PI vs. Nonlinear Passivity-based Control
We carry-out a detailed analysis of direct voltage control of a Boost converter feeding a simple resistive load. First, we prove that using a classical PI control to stabilize a desired equilibrium leads to a very complicated dynamic behavior consisting of two equilibrium points, one of them always unstable for all PI gains and circuit parameter values. Interestingly, the second equilibrium point may be rendered stable -- but for all tuning gains leading to an extremely large value of the circuit current and the controller integrator state. Moreover, if we neglect the resistive effect of the inductor, there is only one equilibrium and it is always unstable. From a practical point of view, it is important to note that the only useful equilibrium point is that of minimum current and that, in addition, there is always a resistive component in the inductor either by its parasitic resistance or by the resistive component of the output impedance of the previous stage. In opposition to this troublesome scenario we recall three nonlinear voltage-feedback controllers, that ensure asymptotic stability of the desired equilibrium with simple gain tuning rules, an easily defined domain of attraction and smooth transient behavior. Two of them are very simple, nonlinear, static voltage feedback rules, while the third one is a variation of the PID scheme called PID-Passivity-based Control (PBC). In its original formulation PID-PBC requires full state measurement, but we present a modified version that incorporates a current observer. All three nonlinear controllers are designed following the principles of PBC, which has had enormous success in many engineering applications.
Detecting Switching Attacks On Traffic Flow Regulation For Changing Driving Patterns
Modern traffic management systems increasingly adopt hierarchical control strategies for improved efficiency and scalability, where a local traffic controller mode is chosen by a supervisory controller based on the changing large-scale driving patterns. Unfortunately, such local metering controllers are also vulnerable to cyberattacks that can disrupt the controller switching, leading to undesired, inefficient, and even unsafe traffic operations. Additionally, the detection of such attacks becomes challenging when the operational mode of the traffic is uncertain and the operational mode identification is delayed. Thus, in this work, we propose a cyberattack detection scheme to detect the compromised controller switching in ramp metering for an uncertain, multimodal macroscopic traffic operation of a freeway segment. In particular, we propose a bank of detectors corresponding to each admissible traffic mode that can compensate for the uncertain traffic mode of the freeway. Furthermore, we utilize backstepping tools along with Lyapunov function theory to achieve analytical performance guarantees for the detector, such as nominal exponential stability, anomaly/uncertainty-to-residual stability, robustness, and sensitivity. Finally, we demonstrate the efficacy of the proposed detection scheme through simulations of free traffic under realistic traffic parameters, uncertainties, and commonly occurring attack scenarios.
comment: 17 pages, 5 figures, 3 tables
Sensitivity of DC Network Representation for GIC Analysis
Geomagnetic disturbances are a threat to the reliability and security of our national critical energy infrastructures. These events specifically result in geomagnetically induced currents, which can cause damage to transformers due to magnetic saturation. In order to mitigate these effects, blocker devices must be placed in optimal locations. Finding this placement requires a dc representation of the ac transmission lines, which this paper discusses. Different decisions in this process, including the method of representing the blocking devices, result in significant variations to the power loss calculations. To analyze these effects, we conclude the paper by comparing the losses on a sample network with different modeling implementations.
comment: 6 pages, 1 table, 7 figures
Categorical Lyapunov Theory II: Stability of Systems
Lyapunov's theorem provides a foundational characterization of stable equilibrium points in dynamical systems. In this paper, we develop a framework for stability for F-coalgebras. We give two definitions for a categorical setting in which we can study the stability of a coalgebra for an endofunctor F. One is minimal and better suited for concrete settings, while the other is more intricate and provides a richer theory. We prove a Lyapunov theorem for both notions of setting for stability, and a converse Lyapunov theorem for the second.
comment: 33 pages
A Benchmark Reference for ESP32-CAM Module SP32
The ESP32-CAM is one of the most widely adopted open-source modules for prototyping embedded vision applications. Since its release in 2019, it has gained popularity among both hobbyists and professional developers due to its affordability, versatility, and integrated wireless capabilities. Despite its widespread use, comprehensive documentation of the performance metrics remains limited. This study addresses this gap by collecting and analyzing over six hours of real-time video streaming logs across all supported resolutions of the OV2640 image sensor, tested under five distinct voltage conditions via an HTTP-based WiFi connection. A long standing bug in the official Arduino ESP32 driver, responsible for inaccurate frame rate logging, was fixed. The resulting analysis includes key performance metrics such as instantaneous and average frame rate, total streamed data, transmission count, and internal chip temperature. The influence of varying power levels was evaluated to assess the reliability of the module.
comment: Full work available at GitHub: https://github.com/TNeutron/ESP32-CAM-Performence-Reference-Benchmark
A Conceptual Introduction to Hetero-functional Graph Theory for Systems-of-Systems
A defining feature of twenty first century engineering challenges is their inherent complexity, demanding the convergence of knowledge across diverse disciplines. Establishing consistent methodological foundations for engineering systems remains a challenge -- one that both systems engineering and network science have sought to address. Model-based systems engineering (MBSE) has recently emerged as a practical, interdisciplinary approach for developing complex systems from concept through implementation. In contrast, network science focuses on the quantitative analysis of networks present within engineering systems. This paper introduces hetero-functional graph theory (HFGT) as a conceptual bridge between these two fields, serving as a tutorial for both communities. For systems engineers, HFGT preserves the heterogeneity of conceptual and ontological constructs in MBSE, including system form, function, and concept. For network scientists, it provides multiple graph-based data structures enabling matrix-based quantitative analysis. The modeling process begins with ontological foundations, defining an engineering system as an abstraction and representing it with a model. Model fidelity is assessed using four linguistic properties: soundness, completeness, lucidity, and laconicity. A meta-architecture is introduced to manage the convergence challenges between domain-specific reference architectures and case-specific instantiations. Unlike other meta-architectures, HFGT is rooted in linguistic structures, modeling resources as subjects, system processes as predicates, and operands-such as matter, energy, organisms, information, and money-as objects. These elements are integrated within a system meta-architecture expressed in the Systems Modeling Language (SysML). The paper concludes by offering guidance for further reading.
comment: arXiv admin note: text overlap with arXiv:2101.07220
A Hetero-functional Graph Theory Perspective of Engineering Management of Mega-Projects
Megaprojects are large-scale, complex, and one-off engineering endeavors that require significant investments from a public or private sector. Such projects generally cost more than a billion dollars, take many years to develop and construct, involve stakeholders both in the public and private sectors, and impact millions of people. Most of the extant megaproject research is concerned with understanding why the engineering management of megaprojects fails so frequently and which dimensions make them so difficult to manage, including size, uncertainty, complexity, urgency, and institutional structure \cite{denicol:2020:00}. Recently, the literature on mega-projects has advocated for a convergence of the engineering management and production system management literature. To that end, this paper proposes the use of Model-Based System Engineering (MBSE) and Hetero-Functional Graph Theory (HFGT), where the latter, quite interestingly, finds its origins in the mass-customized production system literature. More specifically, HFGT was developed so that the physical and informatic parts of production system planning, operations, and decision-making are readily reconfigured to support production customization at scale. As the literature on megaprojects is rapidly evolving with a significant amount of divergence between authors, this report builds upon the recent and extensive megaproject literature review provided by Denicol et. al. \cite{denicol:2020:00}. The paper concludes that MBSE and HFGT provide a means for addressing many of the concluding recommendations provided by Denicol et. al. MBSE and HFGT not only align with current research on megaprojects but also push the boundaries of how the engineering management of megaprojects can gain a unified theoretical foundation.
Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits
This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.
Exploiting Euclidean Distance Field Properties for Fast and Safe 3D planning with a modified Lazy Theta*
Graph search planners have been widely used for 3D path planning in the literature, and Euclidean Distance Fields (EDFs) are increasingly being used as a representation of the environment. However, to the best of our knowledge, the integration of EDFs into heuristic planning has been carried out in a loosely coupled fashion, dismissing EDF properties that can be used to accelerate/improve the planning process and enhance the safety margins of the resultant trajectories. This paper presents a fast graph search planner based on a modified Lazy Theta* planning algorithm for aerial robots in challenging 3D environments that exploits the EDF properties. The proposed planner outperforms classic graph search planners in terms of path smoothness and safety. It integrates EDFs as environment representation and directly generates fast and smooth paths avoiding the use of post-processing methods; it also considers the analytical properties of EDFs to obtain an approximation of the EDF cost along the line-of-sight segments and to reduce the number of visibility neighbours, which directly impacts the computation time. Moreover, we demonstrate that the proposed EDF-based cost function satisfies the triangle inequality, which reduces calculations during exploration and, hence, computation time. Many experiments and comparatives are carried out in 3D challenging indoor and outdoor simulation environments to evaluate and validate the proposed planner. The results show an efficient and safe planner in these environments.
An Advanced Cyber-Physical System Security Testbed for Substation Automation
A Cyber-Physical System (CPS) testbed serves as a powerful platform for testing and validating cyber intrusion detection and mitigation strategies in substations. This study presents the design and development of a CPS testbed that can effectively assess the real-time dynamics of a substation. Cyber attacks exploiting IEC 61850-based SV and GOOSE protocols are demonstrated using the testbed, along with an analysis on attack detection. Realistic timing measurements are obtained, and the time frames for deploying detection and mitigation strategies are evaluated.
comment: CIGRE Symposium 2025, Trondheim, Norway
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
We extend the generalized global Hartman-Grobman theorem by Kvalheim and Sontag for flows to a case of asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277, 4 pages, comments are welcome. V2 contains slightly more details regarding the manifold case
Asymptotic stability equals exponential stability -- while you twist your eyes
Suppose that two vector fields on a smooth manifold render some equilibrium point globally asymptotically stable (GAS). We show that there exists a homotopy between the corresponding semiflows such that this point remains GAS along this homotopy.
comment: Update: condensed version (25 pages) including a new section on ISS
LLM-Enhanced Symbolic Control for Safety-Critical Applications
Motivated by Smart Manufacturing and Industry 4.0, we introduce a framework for synthesizing Abstraction-Based Controller Design (ABCD) for reach-avoid problems from Natural Language (NL) specifications using Large Language Models (LLMs). A Code Agent interprets an NL description of the control problem and translates it into a formal language interpretable by state-of-the-art symbolic control software, while a Checker Agent verifies the correctness of the generated code and enhances safety by identifying specification mismatches. Evaluations show that the system handles linguistic variability and improves robustness over direct planning with LLMs. The proposed approach lowers the barrier to formal control synthesis by enabling intuitive, NL-based task definition while maintaining safety guarantees through automated validation.
SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins
Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incurs prohibitive search costs. In this paper, we propose SCoTT, a wireless-aware path planning framework that leverages vision-language models (VLMs) to co-optimize average path gains and trajectory length using wireless heatmap images and ray-tracing data from a digital twin (DT). At the core of our framework is Strategic Chain-of-Thought Tasking (SCoTT), a novel prompting paradigm that decomposes the exhaustive search problem into structured subtasks, each solved via chain-of-thought prompting. To establish strong baselines, we compare classical A* and wireless-aware extensions of it, and derive DP-WA*, an optimal, iterative dynamic programming algorithm that incorporates all path gains and distance metrics from the DT, but at significant computational cost. In extensive experiments, we show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories. Moreover, SCoTT's intermediate outputs can be used to accelerate DP-WA* by reducing its search space, saving up to 62% in execution time. We validate our framework using four VLMs, demonstrating effectiveness across both large and small models, thus making it applicable to a wide range of compact models at low inference cost. We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations. Finally, we discuss data acquisition pipelines, compute requirements, and deployment considerations for VLMs in 6G-enabled DTs, underscoring the potential of natural language interfaces for wireless-aware navigation in real-world applications.
Hierarchical Neuro-Symbolic Decision Transformer
We present a hierarchical neuro-symbolic control framework that tightly couples a classical symbolic planner with a transformer-based policy to address long-horizon decision-making under uncertainty. At the high level, the planner assembles an interpretable sequence of operators that guarantees logical coherence with task constraints, while at the low level each operator is rendered as a sub-goal token that conditions a decision transformer to generate fine-grained actions directly from raw observations. This bidirectional interface preserves the combinatorial efficiency and explainability of symbolic reasoning without sacrificing the adaptability of deep sequence models, and it permits a principled analysis that tracks how approximation errors from both planning and execution accumulate across the hierarchy. Empirical studies in stochastic grid-world domains demonstrate that the proposed method consistently surpasses purely symbolic, purely neural and existing hierarchical baselines in both success and efficiency, highlighting its robustness for sequential tasks.
Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Gas turbine engines represent complex highly nonlinear dynamical systems. Deriving their physics-based models can be challenging as it requires performance characteristics, that are not always available, and one often has to make many simplifying assumptions. In this paper, the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models are discussed and addressed by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics were estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics was mapped into an optimally constructed Koopman eigenfunction space. The process included eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model was validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator were then designed in the eigenfunction space and compared to the classical and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure allowed targeting individual modes during the optimization process, resulting in a better performance tuning. The results showed that the Koopman-based controller outperformed the other benchmark controllers in both reference tracking and disturbance rejection, under sea-level and varying flight conditions, due to its global nature.
comment: 38 pages, 28 figures
Quantitative Verification with Neural Networks
We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analysis of discrete-time stochastic dynamical models, to the study of assertion-violation and termination analysis of probabilistic programs. We rely on neural networks to represent supermartingale certificates that yield such probability bounds, which we compute using a counterexample-guided inductive synthesis loop: we train the neural certificate while tightening the probability bound over samples of the state space using stochastic optimisation, and then we formally check the certificate's validity over every possible state using satisfiability modulo theories; if we receive a counterexample, we add it to our set of samples and repeat the loop until validity is confirmed. We demonstrate on a diverse set of benchmarks that, thanks to the expressive power of neural networks, our method yields smaller or comparable probability bounds than existing symbolic methods in all cases, and that our approach succeeds on models that are entirely beyond the reach of such alternative techniques.
comment: The conference version of this manuscript appeared at CONCUR 2023
Global Tensor Motion Planning
Batch planning is increasingly necessary to quickly produce diverse and quality motion plans for downstream learning applications, such as distillation and imitation learning. This paper presents Global Tensor Motion Planning (GTMP) -- a sampling-based motion planning algorithm comprising only tensor operations. We introduce a novel discretization structure represented as a random multipartite graph, enabling efficient vectorized sampling, collision checking, and search. We provide a theoretical investigation showing that GTMP exhibits probabilistic completeness while supporting modern GPU/TPU. Additionally, by incorporating smooth structures into the multipartite graph, GTMP directly plans smooth splines without requiring gradient-based optimization. Experiments on lidar-scanned occupancy maps and the MotionBenchMarker dataset demonstrate GTMP's computation efficiency in batch planning compared to baselines, underscoring GTMP's potential as a robust, scalable planner for diverse applications and large-scale robot learning tasks.
comment: 8 pages, 3 figures. Accepted at IEEE Robotics and Automation Letters 2025
Geometry of the Feasible Output Regions of Grid-Interfacing Inverters with Current Limits
Many resources in the grid connect to power grids via programmable grid-interfacing inverters that can provide grid services and offer greater control flexibility and faster response times compared to synchronous generators. However, the current through the inverter needs to be limited to protect the semiconductor components. Existing controllers are designed using somewhat ad hoc methods, for example, by adding current limiters to preexisting control loops, which can lead to stability issues or overly conservative operations. In this paper, we study the geometry of the feasible output region of a current-limited inverter. We show that under a commonly used model, the feasible region is convex. We provide an explicit characterization of this region, which allows us to efficiently find the optimal operating points of the inverter. We demonstrate how knowing the feasible set and its convexity allows us to improve upon existing grid-forming inverters such that their steady-state currents always remain within the current magnitude limit, whereas standard grid-forming controllers can lead to instabilities and violations.
comment: To appear in IEEE L-CSS
Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret
We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.
comment: Accepted to be presented at L4DC'25 (Oral)
Constraints and Variables Reduction for Optimal Power Flow Using Hierarchical Graph Neural Networks with Virtual Node-Splitting
Power system networks are often modeled as homogeneous graphs, which limits the ability of graph neural network (GNN) to capture individual generator features at the same nodes. By introducing the proposed virtual node-splitting strategy, generator-level attributes like costs, limits, and ramp rates can be fully captured by GNN models, improving GNN's learning capacity and prediction accuracy. Optimal power flow (OPF) problem is used for real-time grid operations. Limited timeframe motivates studies to create size-reduced OPF (ROPF) models to relieve the computational complexity. In this paper, with virtual node-splitting, a novel two-stage adaptive hierarchical GNN is developed to (i) predict critical lines that would be congested, and then (ii) predict base generators that would operate at the maximum capacity. This will substantially reduce the constraints and variables needed for OPF, creating the proposed ROPFLG model with reduced monitor lines and reduced generator-specific variables and constraints. Two ROPF models, ROPFL and ROPFG, with just reduced lines or generators respectively, are also implemented as additional benchmark models. Case studies show that the proposed ROPFLG consistently outperforms the benchmark full OPF (FOPF) and the other two ROPF methods, achieving significant computational time savings while reliably finding optimal solutions.
Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach IJCAI 2025
Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA) scheme, aiming at precisely introducing the extent of poisoning in a subtle controlled manner. It operates with a predefined objective, such as reducing global model's prediction accuracy by 10%. FedSA integrates robust nonlinear control-Sliding Mode Control (SMC) theory with model poisoning attacks. It can manipulate the updates from malicious clients to drive the global model towards a compromised state, achieving this at a controlled and inconspicuous rate. Additionally, leveraging the robust control properties of FedSA allows precise control over the convergence bounds, enabling the attacker to set the global accuracy of the poisoned model to any desired level. Experimental results demonstrate that FedSA can accurately achieve a predefined global accuracy with fewer malicious clients while maintaining a high level of stealth and adjustable learning rates.
comment: This paper is to appear in IJCAI 2025, code available at: https://github.com/Halsey777/FedSA
Dexterous Control of an 11-DOF Redundant Robot for CT-Guided Needle Insertion With Task-Oriented Weighted Policies
Computed tomography (CT)-guided needle biopsies are critical for diagnosing a range of conditions, including lung cancer, but present challenges such as limited in-bore space, prolonged procedure times, and radiation exposure. Robotic assistance offers a promising solution by improving needle trajectory accuracy, reducing radiation exposure, and enabling real-time adjustments. In our previous work, we introduced a redundant robotic platform designed for dexterous needle insertion within the confined CT bore. However, its limited base mobility restricts flexible deployment in clinical settings. In this study, we present an improved 11-degree-of-freedom (DOF) robotic system that integrates a 6-DOF robotic base with a 5-DOF cable-driven end-effector, significantly enhancing workspace flexibility and precision. With the hyper-redundant degrees of freedom, we introduce a weighted inverse kinematics controller with a two-stage priority scheme for large-scale movement and fine in-bore adjustments, along with a null-space control strategy to optimize dexterity. We validate our system through both simulation and real-world experiments, demonstrating superior tracking accuracy and enhanced manipulability in CT-guided procedures. The study provides a strong case for hyper-redundancy and null-space control formulations for robot-assisted needle biopsy scenarios.
Online ResNet-Based Adaptive Control for Nonlinear Target Tracking
A generalized ResNet architecture for adaptive control of nonlinear systems with black box uncertainties is developed. The approach overcomes limitations in existing methods by incorporating pre-activation shortcut connections and a zeroth layer block that accommodates different input-output dimensions. The developed Lyapunov-based adaptation law establishes exponential convergence to a neighborhood of the target state despite unknown dynamics and disturbances. Furthermore, the theoretical results are validated through a comparative experiment.
comment: 6 pages, 3 figures
Distributed RISE-based Control for Exponential Heterogeneous Multi-Agent Target Tracking of Second-Order Nonlinear Systems
A distributed implementation of a Robust Integral of the Sign of the Error (RISE) controller is developed for multi-agent target tracking problems with exponential convergence guarantees. Previous RISE-based approaches for multi-agent systems required 2-hop communication, limiting practical applicability. New insights from a Lyapunov-based design-analysis approach are used to eliminate the need for multi-hop communication required in previous literature, while yielding exponential target tracking. The new insights include the development of a new P-function that works in tandem with the graph interaction matrix in the Lyapunov function. Nonsmooth Lyapunov-based stability analysis methods are used to yield semi-global exponential convergence to the target agent state despite the presence of bounded disturbances with bounded derivatives. The resulting outcome is a controller that achieves exponential target tracking with only local information exchange between neighboring agents.
comment: 6 pages, 1 figure
Systems and Control (EESS)
From Connectivity to Autonomy: The Dawn of Self-Evolving Communication Systems
This paper envisions 6G as a self-evolving telecom ecosystem, where AI-driven intelligence enables dynamic adaptation beyond static connectivity. We explore the key enablers of autonomous communication systems, spanning reconfigurable infrastructure, adaptive middleware, and intelligent network functions, alongside multi-agent collaboration for distributed decision-making. We explore how these methodologies align with emerging industrial IoT frameworks, ensuring seamless integration within digital manufacturing processes. Our findings emphasize the potential for improved real-time decision-making, optimizing efficiency, and reducing latency in networked control systems. The discussion addresses ethical challenges, research directions, and standardization efforts, concluding with a technology stack roadmap to guide future developments. By leveraging state-of-the-art 6G network management techniques, this research contributes to the next generation of intelligent automation solutions, bridging the gap between theoretical advancements and real-world industrial applications.
Decoupling Periodic Systems: An Algebraic Approach
This paper addresses the problem of row-by-row (or diagonal) decoupling of discrete-time linear multi-input multi-output systems with periodic time-varying coefficients using periodic state feedback. Previous solutions have tackled row-by-row decoupling using dynamic compensation for square systems and block-decoupling through regular state feedback for nonsquare systems with more outputs than inputs. While it appears likely that a row-by-row state feedback solution for square systems can be deduced from these findings, a direct argument seems more appropriate here as it presents a natural extension for decoupling nonsquare systems with more inputs than outputs. This extension, which necessitates nonregular state feedback, has yet to be explored for periodic systems. Our approach is purely algebraic, based on a time-invariant representation of the periodic system.
comment: 8 pages, submitted February 5, 2025, to IEEE Transactions on Automatic Control
Integrated design of system structure and delayed resonator towards efficient non-collocated vibration absorption
The problem of non-collocated vibration absorption by a delayed resonator is addressed with emphasis on system fatigue resistance and energy efficiency of control actions. The analysis is performed for a system consisting of an arbitrary large series of flexibly linked single-degree-of-freedom masses. For the stage where the vibration of the target mass is fully absorbed by the non-collocated resonator, key forces, motion amplitudes and potential energies across the system structure are assessed. Next, a complete parameter set of the resonator gain and delay is derived, and the actuation force and power needed by the resonator for the full vibration absorption is determined. The derived quantities are utilized in forming an optimization problem to balance minimal risk of fatigue across the system structure and power needed by the resonator, under the closed loop stability and parameter constraints. Next to the gain and delay of the resonator, selected structural parameters of the system are used as variables in the constrained nonlinear optimization problem. Experimental and numerical case studies are included to demonstrate benefits of the proposed integrated structural and control design.
comment: 41 pages, 8 figures, 4 tables, Accepted 4 April 2025, Available online 18 April 2025, Version of Record 6 May 2025
Robust Aperiodic Sampled-Data Washout Control for Uncertain Affine Systems
In this paper, we address the problem of designing an aperiodic sampled-data controller stabilizing the zero-input equilibrium of an uncertain affine plant. The closed-loop system is modeled as a hybrid dynamical system incorporating a timer triggering the occurrence of the sampling events and two memory states storing the value of the controller state and controller output at each sampling time. Necessary and sufficient conditions on the controller parameters are given to establish the sought property. A constructive controller design algorithm based on sum-of-squares programming is given. A numerical example illustrates the effectiveness of the approach.
comment: extended version of the paper presented at IFAC ROCOND 2025
Long Duration Inspection of GNSS-Denied Environments with a Tethered UAV-UGV Marsupial System
Unmanned Aerial Vehicles (UAVs) have become essential tools in inspection and emergency response operations due to their high maneuverability and ability to access hard-to-reach areas. However, their limited battery life significantly restricts their use in long-duration missions. This paper presents a novel tethered marsupial robotic system composed of a UAV and an Unmanned Ground Vehicle (UGV), specifically designed for autonomous, long-duration inspection tasks in Global Navigation Satellite System (GNSS)-denied environments. The system extends the UAV's operational time by supplying power through a tether connected to high-capacity battery packs carried by the UGV. We detail the hardware architecture based on off-the-shelf components to ensure replicability and describe our full-stack software framework, which is composed of open-source components and built upon the Robot Operating System (ROS). The proposed software architecture enables precise localization using a Direct LiDAR Localization (DLL) method and ensures safe path planning and coordinated trajectory tracking for the integrated UGV-tether-UAV system. We validate the system through three field experiments: (1) a manual flight endurance test to estimate the operational duration, (2) an autonomous navigation test, and (3) an inspection mission to demonstrate autonomous inspection capabilities. Experimental results confirm the robustness and autonomy of the system, its capacity to operate in GNSS-denied environments, and its potential for long-endurance, autonomous inspection and monitoring tasks.
comment: 30 pages, 15 figures, 3 tables, 1 algorithm. Submitted to Journal of Intelligent & Robotic Systems
Evaluation of Voltage Unbalance Metrics in Distribution Networks with High DER Penetration
Voltage unbalance, caused by variations in voltage magnitude and phase angle, is a significant power quality issue in three-phase systems, leading to equipment inefficiencies and increased system losses. The integration of distributed energy resources (DER) into the grid adds complexity, as DER can either reduce or worsen voltage unbalance, depending on factors such as grid configuration and the distribution of loads and DER themselves. This study explores the effects of DER penetration on voltage unbalance levels and the accuracy of the different indices most commonly used to quantify this unbalance. The results highlight the varying impacts of DER on unbalance and index performance, emphasizing the need for effective strategies to assess voltage unbalance in modern distribution systems.
CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection
Detection Transformers (DETR) are increasingly adopted in autonomous vehicle (AV) perception systems due to their superior accuracy over convolutional networks. However, concurrently executing multiple DETR tasks presents significant challenges in meeting firm real-time deadlines (R1) and high accuracy requirements (R2), particularly for safety-critical objects, while navigating the inherent latency-accuracy trade-off under resource constraints. Existing real-time DNN scheduling approaches often treat models generically, failing to leverage Transformer-specific properties for efficient resource allocation. To address these challenges, we propose CF-DETR, an integrated system featuring a novel coarse-to-fine Transformer architecture and a dedicated real-time scheduling framework NPFP**. CF-DETR employs three key strategies (A1: coarse-to-fine inference, A2: selective fine inference, A3: multi-level batch inference) that exploit Transformer properties to dynamically adjust patch granularity and attention scope based on object criticality, aiming to satisfy R2. The NPFP** scheduling framework (A4) orchestrates these adaptive mechanisms A1-A3. It partitions each DETR task into a safety-critical coarse subtask for guaranteed critical object detection within its deadline (ensuring R1), and an optional fine subtask for enhanced overall accuracy (R2), while managing individual and batched execution. Our extensive evaluations on server, GPU-enabled embedded platforms, and actual AV platforms demonstrate that CF-DETR, under an NPFP** policy, successfully meets strict timing guarantees for critical operations and achieves significantly higher overall and critical object detection accuracy compared to existing baselines across diverse AV workloads.
comment: 12 pages
Optimizing Connectivity and Scheduling of Near/Far Field Users in Massive MIMO NOMA System
It is envisioned that the next generations of wireless communication environment will be characterized with dense traffic demand due to the prediction that there will be large numbers of active users. Hence, it is important to find a solution to deal with such dense numbers of users. This paper investigates optimizing the connectivity and users scheduling to improve the performance of near and far field users in a downlink, multiuser, massive MIMO-NOMA system. For the considered system model, combining NOMA side by side with massive MIMO offers a great opportunity to exploit the available radio resources and boost the overall system efficiency. The paper proposes separate clustering of near field users and far field users. It also proposes using a beamforming scheme to separately serve the users within each cluster. However, NOMA is proposed to be applied among all users to boost resource sharing. In particular, a cognitive-NOMA beamforming scheme and NOMA themed beamforming are proposed to serve the users within each cluster, and they are compared against random beamforming from literature. Simulation results show that both of the proposed beamforming schemes proved their superiority as compared to random beamforming. Several scheduling techniques were also considered in this paper to examine possible solutions for boosting the system performance considered, namely, priority, joint, dynamic, and fairness-based scheduling techniques for both near field and far field users. The paper also proposes a suboptimal, fairness aiming and gradual allocation approach for allocating the transmission power among the users. The results show that user-clustering offers better connectivity and scheduling performance than the case where no clustering is applied.
Latent Representations for Control Design with Provable Stability and Safety Guarantees
We initiate a formal study on the use of low-dimensional latent representations of dynamical systems for verifiable control synthesis. Our main goal is to enable the application of verification techniques -- such as Lyapunov or barrier functions -- that might otherwise be computationally prohibitive when applied directly to the full state representation. Towards this goal, we first provide dynamics-aware approximate conjugacy conditions which formalize the notion of reconstruction error necessary for systems analysis. We then utilize our conjugacy conditions to transfer the stability and invariance guarantees of a latent certificate function (e.g., a Lyapunov or barrier function) for a latent space controller back to the original system. Importantly, our analysis contains several important implications for learning latent spaces and dynamics, by highlighting the necessary geometric properties which need to be preserved by the latent space, in addition to providing concrete loss functions for dynamics reconstruction that are directly related to control design. We conclude by demonstrating the applicability of our theory to two case studies: (1) stabilization of a cartpole system, and (2) collision avoidance for a two vehicle system.
System Identification for Virtual Sensor-Based Model Predictive Control: Application to a 2-DoF Direct-Drive Robotic Arm
Nonlinear Model Predictive Control (NMPC) offers a powerful approach for controlling complex nonlinear systems, yet faces two key challenges. First, accurately modeling nonlinear dynamics remains difficult. Second, variables directly related to control objectives often cannot be directly measured during operation. Although high-cost sensors can acquire these variables during model development, their use in practical deployment is typically infeasible. To overcome these limitations, we propose a Predictive Virtual Sensor Identification (PVSID) framework that leverages temporary high-cost sensors during the modeling phase to create virtual sensors for NMPC implementation. We validate PVSID on a Two-Degree-of-Freedom (2-DoF) direct-drive robotic arm with complex joint interactions, capturing tip position via motion capture during modeling and utilize an Inertial Measurement Unit (IMU) in NMPC. Experimental results show our NMPC with identified virtual sensors achieves precise tip trajectory tracking without requiring the motion capture system during operation. PVSID offers a practical solution for implementing optimal control in nonlinear systems where the measurement of key variables is constrained by cost or operational limitations.
comment: 6 pages, 5 figures, submitted to L-CSS with CDC 2025 option
Interturn Fault Detection in IPMSMs: Two Adaptive Observer-based Solutions
In this paper we address the problem of online detection of inter-turn short-circuit faults (ITSCFs) that occur in permanent magnet synchronous motors (PMSMs). We propose two solutions to this problem: (i) a very simple linear observer and (ii) a generalized parameter estimation based observer, that incorporates a high performance estimator -- with both observers detecting the short-circuit current and the fault intensity. Although the first solution guarantees the detection of the fault exponentially fast, the rate of convergence is fully determined by the motor parameters that, in some cases, may be too slow. The second observer, on the other hand, ensures finite convergence time under the weakest assumption of interval excitation. To make the observers adaptive, we develop a parameter estimator that, in the case of isotropic PMSMs, estimates on-line (exponentially fast) the resistance and inductance of the motor. It should be underscored that, in contrast with existing observers (including the widely popular Kalman filter) that provide indirect information of the fault current, our observers provide explicit one -- namely the amplitude of the fault current. The performance of both observers, in their linear and generalized parameter estimation-based versions, is illustrated with realistic simulation studies.
Voltage Control of the Boost Converter: PI vs. Nonlinear Passivity-based Control
We carry-out a detailed analysis of direct voltage control of a Boost converter feeding a simple resistive load. First, we prove that using a classical PI control to stabilize a desired equilibrium leads to a very complicated dynamic behavior consisting of two equilibrium points, one of them always unstable for all PI gains and circuit parameter values. Interestingly, the second equilibrium point may be rendered stable -- but for all tuning gains leading to an extremely large value of the circuit current and the controller integrator state. Moreover, if we neglect the resistive effect of the inductor, there is only one equilibrium and it is always unstable. From a practical point of view, it is important to note that the only useful equilibrium point is that of minimum current and that, in addition, there is always a resistive component in the inductor either by its parasitic resistance or by the resistive component of the output impedance of the previous stage. In opposition to this troublesome scenario we recall three nonlinear voltage-feedback controllers, that ensure asymptotic stability of the desired equilibrium with simple gain tuning rules, an easily defined domain of attraction and smooth transient behavior. Two of them are very simple, nonlinear, static voltage feedback rules, while the third one is a variation of the PID scheme called PID-Passivity-based Control (PBC). In its original formulation PID-PBC requires full state measurement, but we present a modified version that incorporates a current observer. All three nonlinear controllers are designed following the principles of PBC, which has had enormous success in many engineering applications.
Detecting Switching Attacks On Traffic Flow Regulation For Changing Driving Patterns
Modern traffic management systems increasingly adopt hierarchical control strategies for improved efficiency and scalability, where a local traffic controller mode is chosen by a supervisory controller based on the changing large-scale driving patterns. Unfortunately, such local metering controllers are also vulnerable to cyberattacks that can disrupt the controller switching, leading to undesired, inefficient, and even unsafe traffic operations. Additionally, the detection of such attacks becomes challenging when the operational mode of the traffic is uncertain and the operational mode identification is delayed. Thus, in this work, we propose a cyberattack detection scheme to detect the compromised controller switching in ramp metering for an uncertain, multimodal macroscopic traffic operation of a freeway segment. In particular, we propose a bank of detectors corresponding to each admissible traffic mode that can compensate for the uncertain traffic mode of the freeway. Furthermore, we utilize backstepping tools along with Lyapunov function theory to achieve analytical performance guarantees for the detector, such as nominal exponential stability, anomaly/uncertainty-to-residual stability, robustness, and sensitivity. Finally, we demonstrate the efficacy of the proposed detection scheme through simulations of free traffic under realistic traffic parameters, uncertainties, and commonly occurring attack scenarios.
comment: 17 pages, 5 figures, 3 tables
Sensitivity of DC Network Representation for GIC Analysis
Geomagnetic disturbances are a threat to the reliability and security of our national critical energy infrastructures. These events specifically result in geomagnetically induced currents, which can cause damage to transformers due to magnetic saturation. In order to mitigate these effects, blocker devices must be placed in optimal locations. Finding this placement requires a dc representation of the ac transmission lines, which this paper discusses. Different decisions in this process, including the method of representing the blocking devices, result in significant variations to the power loss calculations. To analyze these effects, we conclude the paper by comparing the losses on a sample network with different modeling implementations.
comment: 6 pages, 1 table, 7 figures
Categorical Lyapunov Theory II: Stability of Systems
Lyapunov's theorem provides a foundational characterization of stable equilibrium points in dynamical systems. In this paper, we develop a framework for stability for F-coalgebras. We give two definitions for a categorical setting in which we can study the stability of a coalgebra for an endofunctor F. One is minimal and better suited for concrete settings, while the other is more intricate and provides a richer theory. We prove a Lyapunov theorem for both notions of setting for stability, and a converse Lyapunov theorem for the second.
comment: 33 pages
A Benchmark Reference for ESP32-CAM Module SP32
The ESP32-CAM is one of the most widely adopted open-source modules for prototyping embedded vision applications. Since its release in 2019, it has gained popularity among both hobbyists and professional developers due to its affordability, versatility, and integrated wireless capabilities. Despite its widespread use, comprehensive documentation of the performance metrics remains limited. This study addresses this gap by collecting and analyzing over six hours of real-time video streaming logs across all supported resolutions of the OV2640 image sensor, tested under five distinct voltage conditions via an HTTP-based WiFi connection. A long standing bug in the official Arduino ESP32 driver, responsible for inaccurate frame rate logging, was fixed. The resulting analysis includes key performance metrics such as instantaneous and average frame rate, total streamed data, transmission count, and internal chip temperature. The influence of varying power levels was evaluated to assess the reliability of the module.
comment: Full work available at GitHub: https://github.com/TNeutron/ESP32-CAM-Performence-Reference-Benchmark
A Conceptual Introduction to Hetero-functional Graph Theory for Systems-of-Systems
A defining feature of twenty first century engineering challenges is their inherent complexity, demanding the convergence of knowledge across diverse disciplines. Establishing consistent methodological foundations for engineering systems remains a challenge -- one that both systems engineering and network science have sought to address. Model-based systems engineering (MBSE) has recently emerged as a practical, interdisciplinary approach for developing complex systems from concept through implementation. In contrast, network science focuses on the quantitative analysis of networks present within engineering systems. This paper introduces hetero-functional graph theory (HFGT) as a conceptual bridge between these two fields, serving as a tutorial for both communities. For systems engineers, HFGT preserves the heterogeneity of conceptual and ontological constructs in MBSE, including system form, function, and concept. For network scientists, it provides multiple graph-based data structures enabling matrix-based quantitative analysis. The modeling process begins with ontological foundations, defining an engineering system as an abstraction and representing it with a model. Model fidelity is assessed using four linguistic properties: soundness, completeness, lucidity, and laconicity. A meta-architecture is introduced to manage the convergence challenges between domain-specific reference architectures and case-specific instantiations. Unlike other meta-architectures, HFGT is rooted in linguistic structures, modeling resources as subjects, system processes as predicates, and operands-such as matter, energy, organisms, information, and money-as objects. These elements are integrated within a system meta-architecture expressed in the Systems Modeling Language (SysML). The paper concludes by offering guidance for further reading.
comment: arXiv admin note: text overlap with arXiv:2101.07220
A Hetero-functional Graph Theory Perspective of Engineering Management of Mega-Projects
Megaprojects are large-scale, complex, and one-off engineering endeavors that require significant investments from a public or private sector. Such projects generally cost more than a billion dollars, take many years to develop and construct, involve stakeholders both in the public and private sectors, and impact millions of people. Most of the extant megaproject research is concerned with understanding why the engineering management of megaprojects fails so frequently and which dimensions make them so difficult to manage, including size, uncertainty, complexity, urgency, and institutional structure \cite{denicol:2020:00}. Recently, the literature on mega-projects has advocated for a convergence of the engineering management and production system management literature. To that end, this paper proposes the use of Model-Based System Engineering (MBSE) and Hetero-Functional Graph Theory (HFGT), where the latter, quite interestingly, finds its origins in the mass-customized production system literature. More specifically, HFGT was developed so that the physical and informatic parts of production system planning, operations, and decision-making are readily reconfigured to support production customization at scale. As the literature on megaprojects is rapidly evolving with a significant amount of divergence between authors, this report builds upon the recent and extensive megaproject literature review provided by Denicol et. al. \cite{denicol:2020:00}. The paper concludes that MBSE and HFGT provide a means for addressing many of the concluding recommendations provided by Denicol et. al. MBSE and HFGT not only align with current research on megaprojects but also push the boundaries of how the engineering management of megaprojects can gain a unified theoretical foundation.
Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits
This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.
Exploiting Euclidean Distance Field Properties for Fast and Safe 3D planning with a modified Lazy Theta*
Graph search planners have been widely used for 3D path planning in the literature, and Euclidean Distance Fields (EDFs) are increasingly being used as a representation of the environment. However, to the best of our knowledge, the integration of EDFs into heuristic planning has been carried out in a loosely coupled fashion, dismissing EDF properties that can be used to accelerate/improve the planning process and enhance the safety margins of the resultant trajectories. This paper presents a fast graph search planner based on a modified Lazy Theta* planning algorithm for aerial robots in challenging 3D environments that exploits the EDF properties. The proposed planner outperforms classic graph search planners in terms of path smoothness and safety. It integrates EDFs as environment representation and directly generates fast and smooth paths avoiding the use of post-processing methods; it also considers the analytical properties of EDFs to obtain an approximation of the EDF cost along the line-of-sight segments and to reduce the number of visibility neighbours, which directly impacts the computation time. Moreover, we demonstrate that the proposed EDF-based cost function satisfies the triangle inequality, which reduces calculations during exploration and, hence, computation time. Many experiments and comparatives are carried out in 3D challenging indoor and outdoor simulation environments to evaluate and validate the proposed planner. The results show an efficient and safe planner in these environments.
An Advanced Cyber-Physical System Security Testbed for Substation Automation
A Cyber-Physical System (CPS) testbed serves as a powerful platform for testing and validating cyber intrusion detection and mitigation strategies in substations. This study presents the design and development of a CPS testbed that can effectively assess the real-time dynamics of a substation. Cyber attacks exploiting IEC 61850-based SV and GOOSE protocols are demonstrated using the testbed, along with an analysis on attack detection. Realistic timing measurements are obtained, and the time frames for deploying detection and mitigation strategies are evaluated.
comment: CIGRE Symposium 2025, Trondheim, Norway
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
We extend the generalized global Hartman-Grobman theorem by Kvalheim and Sontag for flows to a case of asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277, 4 pages, comments are welcome. V2 contains slightly more details regarding the manifold case
Asymptotic stability equals exponential stability -- while you twist your eyes
Suppose that two vector fields on a smooth manifold render some equilibrium point globally asymptotically stable (GAS). We show that there exists a homotopy between the corresponding semiflows such that this point remains GAS along this homotopy.
comment: Update: condensed version (25 pages) including a new section on ISS
LLM-Enhanced Symbolic Control for Safety-Critical Applications
Motivated by Smart Manufacturing and Industry 4.0, we introduce a framework for synthesizing Abstraction-Based Controller Design (ABCD) for reach-avoid problems from Natural Language (NL) specifications using Large Language Models (LLMs). A Code Agent interprets an NL description of the control problem and translates it into a formal language interpretable by state-of-the-art symbolic control software, while a Checker Agent verifies the correctness of the generated code and enhances safety by identifying specification mismatches. Evaluations show that the system handles linguistic variability and improves robustness over direct planning with LLMs. The proposed approach lowers the barrier to formal control synthesis by enabling intuitive, NL-based task definition while maintaining safety guarantees through automated validation.
SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins
Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incurs prohibitive search costs. In this paper, we propose SCoTT, a wireless-aware path planning framework that leverages vision-language models (VLMs) to co-optimize average path gains and trajectory length using wireless heatmap images and ray-tracing data from a digital twin (DT). At the core of our framework is Strategic Chain-of-Thought Tasking (SCoTT), a novel prompting paradigm that decomposes the exhaustive search problem into structured subtasks, each solved via chain-of-thought prompting. To establish strong baselines, we compare classical A* and wireless-aware extensions of it, and derive DP-WA*, an optimal, iterative dynamic programming algorithm that incorporates all path gains and distance metrics from the DT, but at significant computational cost. In extensive experiments, we show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories. Moreover, SCoTT's intermediate outputs can be used to accelerate DP-WA* by reducing its search space, saving up to 62% in execution time. We validate our framework using four VLMs, demonstrating effectiveness across both large and small models, thus making it applicable to a wide range of compact models at low inference cost. We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations. Finally, we discuss data acquisition pipelines, compute requirements, and deployment considerations for VLMs in 6G-enabled DTs, underscoring the potential of natural language interfaces for wireless-aware navigation in real-world applications.
Hierarchical Neuro-Symbolic Decision Transformer
We present a hierarchical neuro-symbolic control framework that tightly couples a classical symbolic planner with a transformer-based policy to address long-horizon decision-making under uncertainty. At the high level, the planner assembles an interpretable sequence of operators that guarantees logical coherence with task constraints, while at the low level each operator is rendered as a sub-goal token that conditions a decision transformer to generate fine-grained actions directly from raw observations. This bidirectional interface preserves the combinatorial efficiency and explainability of symbolic reasoning without sacrificing the adaptability of deep sequence models, and it permits a principled analysis that tracks how approximation errors from both planning and execution accumulate across the hierarchy. Empirical studies in stochastic grid-world domains demonstrate that the proposed method consistently surpasses purely symbolic, purely neural and existing hierarchical baselines in both success and efficiency, highlighting its robustness for sequential tasks.
Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Gas turbine engines represent complex highly nonlinear dynamical systems. Deriving their physics-based models can be challenging as it requires performance characteristics, that are not always available, and one often has to make many simplifying assumptions. In this paper, the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models are discussed and addressed by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics were estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics was mapped into an optimally constructed Koopman eigenfunction space. The process included eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model was validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator were then designed in the eigenfunction space and compared to the classical and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure allowed targeting individual modes during the optimization process, resulting in a better performance tuning. The results showed that the Koopman-based controller outperformed the other benchmark controllers in both reference tracking and disturbance rejection, under sea-level and varying flight conditions, due to its global nature.
comment: 38 pages, 28 figures
Quantitative Verification with Neural Networks
We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analysis of discrete-time stochastic dynamical models, to the study of assertion-violation and termination analysis of probabilistic programs. We rely on neural networks to represent supermartingale certificates that yield such probability bounds, which we compute using a counterexample-guided inductive synthesis loop: we train the neural certificate while tightening the probability bound over samples of the state space using stochastic optimisation, and then we formally check the certificate's validity over every possible state using satisfiability modulo theories; if we receive a counterexample, we add it to our set of samples and repeat the loop until validity is confirmed. We demonstrate on a diverse set of benchmarks that, thanks to the expressive power of neural networks, our method yields smaller or comparable probability bounds than existing symbolic methods in all cases, and that our approach succeeds on models that are entirely beyond the reach of such alternative techniques.
comment: The conference version of this manuscript appeared at CONCUR 2023
Global Tensor Motion Planning
Batch planning is increasingly necessary to quickly produce diverse and quality motion plans for downstream learning applications, such as distillation and imitation learning. This paper presents Global Tensor Motion Planning (GTMP) -- a sampling-based motion planning algorithm comprising only tensor operations. We introduce a novel discretization structure represented as a random multipartite graph, enabling efficient vectorized sampling, collision checking, and search. We provide a theoretical investigation showing that GTMP exhibits probabilistic completeness while supporting modern GPU/TPU. Additionally, by incorporating smooth structures into the multipartite graph, GTMP directly plans smooth splines without requiring gradient-based optimization. Experiments on lidar-scanned occupancy maps and the MotionBenchMarker dataset demonstrate GTMP's computation efficiency in batch planning compared to baselines, underscoring GTMP's potential as a robust, scalable planner for diverse applications and large-scale robot learning tasks.
comment: 8 pages, 3 figures. Accepted at IEEE Robotics and Automation Letters 2025
Geometry of the Feasible Output Regions of Grid-Interfacing Inverters with Current Limits
Many resources in the grid connect to power grids via programmable grid-interfacing inverters that can provide grid services and offer greater control flexibility and faster response times compared to synchronous generators. However, the current through the inverter needs to be limited to protect the semiconductor components. Existing controllers are designed using somewhat ad hoc methods, for example, by adding current limiters to preexisting control loops, which can lead to stability issues or overly conservative operations. In this paper, we study the geometry of the feasible output region of a current-limited inverter. We show that under a commonly used model, the feasible region is convex. We provide an explicit characterization of this region, which allows us to efficiently find the optimal operating points of the inverter. We demonstrate how knowing the feasible set and its convexity allows us to improve upon existing grid-forming inverters such that their steady-state currents always remain within the current magnitude limit, whereas standard grid-forming controllers can lead to instabilities and violations.
comment: To appear in IEEE L-CSS
Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret
We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.
comment: Accepted to be presented at L4DC'25 (Oral)
Constraints and Variables Reduction for Optimal Power Flow Using Hierarchical Graph Neural Networks with Virtual Node-Splitting
Power system networks are often modeled as homogeneous graphs, which limits the ability of graph neural network (GNN) to capture individual generator features at the same nodes. By introducing the proposed virtual node-splitting strategy, generator-level attributes like costs, limits, and ramp rates can be fully captured by GNN models, improving GNN's learning capacity and prediction accuracy. Optimal power flow (OPF) problem is used for real-time grid operations. Limited timeframe motivates studies to create size-reduced OPF (ROPF) models to relieve the computational complexity. In this paper, with virtual node-splitting, a novel two-stage adaptive hierarchical GNN is developed to (i) predict critical lines that would be congested, and then (ii) predict base generators that would operate at the maximum capacity. This will substantially reduce the constraints and variables needed for OPF, creating the proposed ROPFLG model with reduced monitor lines and reduced generator-specific variables and constraints. Two ROPF models, ROPFL and ROPFG, with just reduced lines or generators respectively, are also implemented as additional benchmark models. Case studies show that the proposed ROPFLG consistently outperforms the benchmark full OPF (FOPF) and the other two ROPF methods, achieving significant computational time savings while reliably finding optimal solutions.
Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach IJCAI 2025
Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA) scheme, aiming at precisely introducing the extent of poisoning in a subtle controlled manner. It operates with a predefined objective, such as reducing global model's prediction accuracy by 10%. FedSA integrates robust nonlinear control-Sliding Mode Control (SMC) theory with model poisoning attacks. It can manipulate the updates from malicious clients to drive the global model towards a compromised state, achieving this at a controlled and inconspicuous rate. Additionally, leveraging the robust control properties of FedSA allows precise control over the convergence bounds, enabling the attacker to set the global accuracy of the poisoned model to any desired level. Experimental results demonstrate that FedSA can accurately achieve a predefined global accuracy with fewer malicious clients while maintaining a high level of stealth and adjustable learning rates.
comment: This paper is to appear in IJCAI 2025, code available at: https://github.com/Halsey777/FedSA
Dexterous Control of an 11-DOF Redundant Robot for CT-Guided Needle Insertion With Task-Oriented Weighted Policies
Computed tomography (CT)-guided needle biopsies are critical for diagnosing a range of conditions, including lung cancer, but present challenges such as limited in-bore space, prolonged procedure times, and radiation exposure. Robotic assistance offers a promising solution by improving needle trajectory accuracy, reducing radiation exposure, and enabling real-time adjustments. In our previous work, we introduced a redundant robotic platform designed for dexterous needle insertion within the confined CT bore. However, its limited base mobility restricts flexible deployment in clinical settings. In this study, we present an improved 11-degree-of-freedom (DOF) robotic system that integrates a 6-DOF robotic base with a 5-DOF cable-driven end-effector, significantly enhancing workspace flexibility and precision. With the hyper-redundant degrees of freedom, we introduce a weighted inverse kinematics controller with a two-stage priority scheme for large-scale movement and fine in-bore adjustments, along with a null-space control strategy to optimize dexterity. We validate our system through both simulation and real-world experiments, demonstrating superior tracking accuracy and enhanced manipulability in CT-guided procedures. The study provides a strong case for hyper-redundancy and null-space control formulations for robot-assisted needle biopsy scenarios.
Online ResNet-Based Adaptive Control for Nonlinear Target Tracking
A generalized ResNet architecture for adaptive control of nonlinear systems with black box uncertainties is developed. The approach overcomes limitations in existing methods by incorporating pre-activation shortcut connections and a zeroth layer block that accommodates different input-output dimensions. The developed Lyapunov-based adaptation law establishes exponential convergence to a neighborhood of the target state despite unknown dynamics and disturbances. Furthermore, the theoretical results are validated through a comparative experiment.
comment: 6 pages, 3 figures
Distributed RISE-based Control for Exponential Heterogeneous Multi-Agent Target Tracking of Second-Order Nonlinear Systems
A distributed implementation of a Robust Integral of the Sign of the Error (RISE) controller is developed for multi-agent target tracking problems with exponential convergence guarantees. Previous RISE-based approaches for multi-agent systems required 2-hop communication, limiting practical applicability. New insights from a Lyapunov-based design-analysis approach are used to eliminate the need for multi-hop communication required in previous literature, while yielding exponential target tracking. The new insights include the development of a new P-function that works in tandem with the graph interaction matrix in the Lyapunov function. Nonsmooth Lyapunov-based stability analysis methods are used to yield semi-global exponential convergence to the target agent state despite the presence of bounded disturbances with bounded derivatives. The resulting outcome is a controller that achieves exponential target tracking with only local information exchange between neighboring agents.
comment: 6 pages, 1 figure
Robotics
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control
Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications -- parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.
comment: Project webpage: https://younggyo.me/fast_td3
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.
SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning
Imitation learning advances robot capabilities by enabling the acquisition of diverse behaviors from human demonstrations. However, large-scale datasets used for policy training often introduce substantial variability in quality, which can negatively impact performance. As a result, automatically curating datasets by filtering low-quality samples to improve quality becomes essential. Existing robotic curation approaches rely on costly manual annotations and perform curation at a coarse granularity, such as the dataset or trajectory level, failing to account for the quality of individual state-action pairs. To address this, we introduce SCIZOR, a self-supervised data curation framework that filters out low-quality state-action pairs to improve the performance of imitation learning policies. SCIZOR targets two complementary sources of low-quality data: suboptimal data, which hinders learning with undesirable actions, and redundant data, which dilutes training with repetitive patterns. SCIZOR leverages a self-supervised task progress predictor for suboptimal data to remove samples lacking task progression, and a deduplication module operating on joint state-action representation for samples with redundant patterns. Empirically, we show that SCIZOR enables imitation learning policies to achieve higher performance with less data, yielding an average improvement of 15.4% across multiple benchmarks. More information is available at: https://ut-austin-rpl.github.io/SCIZOR/
VR-Based Control of Multi-Copter Operation
We aim to use virtual reality (VR) to improve the spatial awareness of pilots by real-time scanning of the environment around the drone using onboard sensors, live streaming of this environment to a VR headset, and rendering a virtual representation of the drone and its environment for the pilot. This way, the pilot can see the immediate environment of the drone up close from a third-person perspective, as opposed to the first-person perspective that most drone cameras provide. This provides much more information about the drone surroundings for the pilot while operating the drone than existing teleoperation solutions. Previous solutions using VR have relied upon pre-made designs of the environment, which makes it difficult to adapt to changing environments. Our solution, in contrast, scans the environment as you fly, making it much more flexible for use in unknown environments.
Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation
Recent progress in mixed reality (MR) and robotics is enabling increasingly sophisticated forms of human-robot collaboration. Building on these developments, we introduce a novel MR framework that allows multiple quadruped robots to operate in semantically diverse environments via a MR interface. Our system supports collaborative tasks involving drawers, swing doors, and higher-level infrastructure such as light switches. A comprehensive user study verifies both the design and usability of our app, with participants giving a "good" or "very good" rating in almost all cases. Overall, our approach provides an effective and intuitive framework for MR-based multi-robot collaboration in complex, real-world scenarios.
From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation
While embodied agents have made significant progress in performing complex physical tasks, real-world applications demand more than pure task execution. The agents must collaborate with unfamiliar agents and human users, whose goals are often vague and implicit. In such settings, interpreting ambiguous instructions and uncovering underlying desires is essential for effective assistance. Therefore, fast and accurate desire alignment becomes a critical capability for embodied agents. In this work, we first develop a home assistance simulation environment HA-Desire that integrates an LLM-driven human user agent exhibiting realistic value-driven goal selection and communication. The ego agent must interact with this proxy user to infer and adapt to the user's latent desires. To achieve this, we present a novel framework FAMER for fast desire alignment, which introduces a desire-based mental reasoning mechanism to identify user intent and filter desire-irrelevant actions. We further design a reflection-based communication module that reduces redundant inquiries, and incorporate goal-relevant information extraction with memory persistence to improve information reuse and reduce unnecessary exploration. Extensive experiments demonstrate that our framework significantly enhances both task execution and communication efficiency, enabling embodied agents to quickly adapt to user-specific desires in complex embodied environments.
Fully Packed and Ready to Go: High-Density, Rearrangement-Free, Grid-Based Storage and Retrieval
Grid-based storage systems with uniformly shaped loads (e.g., containers, pallets, totes) are commonplace in logistics, industrial, and transportation domains. A key performance metric for such systems is the maximization of space utilization, which requires some loads to be placed behind or below others, preventing direct access to them. Consequently, dense storage settings bring up the challenge of determining how to place loads while minimizing costly rearrangement efforts necessary during retrieval. This paper considers the setting involving an inbound phase, during which loads arrive, followed by an outbound phase, during which loads depart. The setting is prevalent in distribution centers, automated parking garages, and container ports. In both phases, minimizing the number of rearrangement actions results in more optimal (e.g., fast, energy-efficient, etc.) operations. In contrast to previous work focusing on stack-based systems, this effort examines the case where loads can be freely moved along the grid, e.g., by a mobile robot, expanding the range of possible motions. We establish that for a range of scenarios, such as having limited prior knowledge of the loads' arrival sequences or grids with a narrow opening, a (best possible) rearrangement-free solution always exists, including when the loads fill the grid to its capacity. In particular, when the sequences are fully known, we establish an intriguing characterization showing that rearrangement can always be avoided if and only if the open side of the grid (used to access the storage) is at least 3 cells wide. We further discuss useful practical implications of our solutions.
COSMOS: A Data-Driven Probabilistic Time Series simulator for Chemical Plumes across Spatial Scales
The development of robust odor navigation strategies for automated environmental monitoring applications requires realistic simulations of odor time series for agents moving across large spatial scales. Traditional approaches that rely on computational fluid dynamics (CFD) methods can capture the spatiotemporal dynamics of odor plumes, but are impractical for large-scale simulations due to their computational expense. On the other hand, puff-based simulations, although computationally tractable for large scales and capable of capturing the stochastic nature of plumes, fail to reproduce naturalistic odor statistics. Here, we present COSMOS (Configurable Odor Simulation Model over Scalable Spaces), a data-driven probabilistic framework that synthesizes realistic odor time series from spatial and temporal features of real datasets. COSMOS generates similar distributions of key statistical features such as whiff frequency, duration, and concentration as observed in real data, while dramatically reducing computational overhead. By reproducing critical statistical properties across a variety of flow regimes and scales, COSMOS enables the development and evaluation of agent-based navigation strategies with naturalistic odor experiences. To demonstrate its utility, we compare odor-tracking agents exposed to CFD-generated plumes versus COSMOS simulations, showing that both their odor experiences and resulting behaviors are quite similar.
comment: 16 pages, 4 primary figures
Zero-Shot 3D Visual Grounding from Vision-Language Models CVPR 2025
3D Visual Grounding (3DVG) seeks to locate target objects in 3D scenes using natural language descriptions, enabling downstream applications such as augmented reality and robotics. Existing approaches typically rely on labeled 3D data and predefined categories, limiting scalability to open-world settings. We present SeeGround, a zero-shot 3DVG framework that leverages 2D Vision-Language Models (VLMs) to bypass the need for 3D-specific training. To bridge the modality gap, we introduce a hybrid input format that pairs query-aligned rendered views with spatially enriched textual descriptions. Our framework incorporates two core components: a Perspective Adaptation Module that dynamically selects optimal viewpoints based on the query, and a Fusion Alignment Module that integrates visual and spatial signals to enhance localization precision. Extensive evaluations on ScanRefer and Nr3D confirm that SeeGround achieves substantial improvements over existing zero-shot baselines -- outperforming them by 7.7% and 7.1%, respectively -- and even rivals fully supervised alternatives, demonstrating strong generalization under challenging conditions.
comment: 3D-LLM/VLA @ CVPR 2025; Project Page at https://seeground.github.io/
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
Recent advancements in world models have revolutionized dynamic environment simulation, allowing systems to foresee future states and assess potential actions. In autonomous driving, these capabilities help vehicles anticipate the behavior of other road users, perform risk-aware planning, accelerate training in simulation, and adapt to novel scenarios, thereby enhancing safety and reliability. Current approaches exhibit deficiencies in maintaining robust 3D geometric consistency or accumulating artifacts during occlusion handling, both critical for reliable safety assessment in autonomous navigation tasks. To address this, we introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models to enhance spatial understanding and action controllability. Specifically, we first extract a 3D representation from the input frame and then obtain its 2D rendering based on the user-specified ego-car trajectory. To enable dynamic modeling, we propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles. Extensive experiments demonstrate that our method significantly outperforms existing models in both action accuracy and 3D spatial awareness, leading to more realistic, adaptable, and reliable scene modeling for safer autonomous driving. Additionally, our model can generalize to novel trajectories and offers interactive scene editing capabilities, such as object editing and object trajectory control.
comment: code will be released at https://github.com/antonioo-c/GeoDrive
Efficient Precision-Scalable Hardware for Microscaling (MX) Processing in Robotics Learning
Autonomous robots require efficient on-device learning to adapt to new environments without cloud dependency. For this edge training, Microscaling (MX) data types offer a promising solution by combining integer and floating-point representations with shared exponents, reducing energy consumption while maintaining accuracy. However, the state-of-the-art continuous learning processor, namely Dacapo, faces limitations with its MXINT-only support and inefficient vector-based grouping during backpropagation. In this paper, we present, to the best of our knowledge, the first work that addresses these limitations with two key innovations: (1) a precision-scalable arithmetic unit that supports all six MX data types by exploiting sub-word parallelism and unified integer and floating-point processing; and (2) support for square shared exponent groups to enable efficient weight handling during backpropagation, removing storage redundancy and quantization overhead. We evaluate our design against Dacapo under iso-peak-throughput on four robotics workloads in TSMC 16nm FinFET technology at 500MHz, reaching a 25.6% area reduction, a 51% lower memory footprint, and 4x higher effective training throughput while achieving comparable energy-efficiency, enabling efficient robotics continual learning at the edge.
comment: To appear in 2025 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED 2025)
Task-Driven Implicit Representations for Automated Design of LiDAR Systems
Imaging system design is a complex, time-consuming, and largely manual process; LiDAR design, ubiquitous in mobile devices, autonomous vehicles, and aerial imaging platforms, adds further complexity through unique spatial and temporal sampling requirements. In this work, we propose a framework for automated, task-driven LiDAR system design under arbitrary constraints. To achieve this, we represent LiDAR configurations in a continuous six-dimensional design space and learn task-specific implicit densities in this space via flow-based generative modeling. We then synthesize new LiDAR systems by modeling sensors as parametric distributions in 6D space and fitting these distributions to our learned implicit density using expectation-maximization, enabling efficient, constraint-aware LiDAR system design. We validate our method on diverse tasks in 3D vision, enabling automated LiDAR system design across real-world-inspired applications in face scanning, robotic tracking, and object detection.
UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments
Recent 3D Gaussian Splatting (3DGS) techniques for Visual Simultaneous Localization and Mapping (SLAM) have significantly progressed in tracking and high-fidelity mapping. However, their sequential optimization framework and sensitivity to dynamic objects limit real-time performance and robustness in real-world scenarios. We present UP-SLAM, a real-time RGB-D SLAM system for dynamic environments that decouples tracking and mapping through a parallelized framework. A probabilistic octree is employed to manage Gaussian primitives adaptively, enabling efficient initialization and pruning without hand-crafted thresholds. To robustly filter dynamic regions during tracking, we propose a training-free uncertainty estimator that fuses multi-modal residuals to estimate per-pixel motion uncertainty, achieving open-set dynamic object handling without reliance on semantic labels. Furthermore, a temporal encoder is designed to enhance rendering quality. Concurrently, low-dimensional features are efficiently transformed via a shallow multilayer perceptron to construct DINO features, which are then employed to enrich the Gaussian field and improve the robustness of uncertainty prediction. Extensive experiments on multiple challenging datasets suggest that UP-SLAM outperforms state-of-the-art methods in both localization accuracy (by 59.8%) and rendering quality (by 4.57 dB PSNR), while maintaining real-time performance and producing reusable, artifact-free static maps in dynamic environments.The project: https://aczheng-cai.github.io/up_slam.github.io/
LiDAR Based Semantic Perception for Forklifts in Outdoor Environments
In this study, we present a novel LiDAR-based semantic segmentation framework tailored for autonomous forklifts operating in complex outdoor environments. Central to our approach is the integration of a dual LiDAR system, which combines forward-facing and downward-angled LiDAR sensors to enable comprehensive scene understanding, specifically tailored for industrial material handling tasks. The dual configuration improves the detection and segmentation of dynamic and static obstacles with high spatial precision. Using high-resolution 3D point clouds captured from two sensors, our method employs a lightweight yet robust approach that segments the point clouds into safety-critical instance classes such as pedestrians, vehicles, and forklifts, as well as environmental classes such as driveable ground, lanes, and buildings. Experimental validation demonstrates that our approach achieves high segmentation accuracy while satisfying strict runtime requirements, establishing its viability for safety-aware, fully autonomous forklift navigation in dynamic warehouse and yard environments.
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
Vision-Language-Action (VLA) models have advanced general-purpose robotic manipulation by leveraging pretrained visual and linguistic representations. However, they struggle with contact-rich tasks that require fine-grained control involving force, especially under visual occlusion or dynamic uncertainty. To address these limitations, we propose \textbf{ForceVLA}, a novel end-to-end manipulation framework that treats external force sensing as a first-class modality within VLA systems. ForceVLA introduces \textbf{FVLMoE}, a force-aware Mixture-of-Experts fusion module that dynamically integrates pretrained visual-language embeddings with real-time 6-axis force feedback during action decoding. This enables context-aware routing across modality-specific experts, enhancing the robot's ability to adapt to subtle contact dynamics. We also introduce \textbf{ForceVLA-Data}, a new dataset comprising synchronized vision, proprioception, and force-torque signals across five contact-rich manipulation tasks. ForceVLA improves average task success by 23.2\% over strong $\pi_0$-based baselines, achieving up to 80\% success in tasks such as plug insertion. Our approach highlights the importance of multimodal integration for dexterous manipulation and sets a new benchmark for physically intelligent robotic control. Code and data will be released at https://sites.google.com/view/forcevla2025.
Efficient Dynamic Shielding for Parametric Safety Specifications
Shielding has emerged as a promising approach for ensuring safety of AI-controlled autonomous systems. The algorithmic goal is to compute a shield, which is a runtime safety enforcement tool that needs to monitor and intervene the AI controller's actions if safety could be compromised otherwise. Traditional shields are designed statically for a specific safety requirement. Therefore, if the safety requirement changes at runtime due to changing operating conditions, the shield needs to be recomputed from scratch, causing delays that could be fatal. We introduce dynamic shields for parametric safety specifications, which are succinctly represented sets of all possible safety specifications that may be encountered at runtime. Our dynamic shields are statically designed for a given safety parameter set, and are able to dynamically adapt as the true safety specification (permissible by the parameters) is revealed at runtime. The main algorithmic novelty lies in the dynamic adaptation procedure, which is a simple and fast algorithm that utilizes known features of standard safety shields, like maximal permissiveness. We report experimental results for a robot navigation problem in unknown territories, where the safety specification evolves as new obstacles are discovered at runtime. In our experiments, the dynamic shields took a few minutes for their offline design, and took between a fraction of a second and a few seconds for online adaptation at each step, whereas the brute-force online recomputation approach was up to 5 times slower.
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project Webpage: https://reinflow.github.io/
comment: 30 pages, 13 figures, 10 tables
A simulation framework for autonomous lunar construction work
We present a simulation framework for lunar construction work involving multiple autonomous machines. The framework supports modelling of construction scenarios and autonomy solutions, execution of the scenarios in simulation, and analysis of work time and energy consumption throughout the construction project. The simulations are based on physics-based models for contacting multibody dynamics and deformable terrain, including vehicle-soil interaction forces and soil flow in real time. A behaviour tree manages the operational logic and error handling, which enables the representation of complex behaviours through a discrete set of simpler tasks in a modular hierarchical structure. High-level decision-making is separated from lower-level control algorithms, with the two connected via ROS2. Excavation movements are controlled through inverse kinematics and tracking controllers. The framework is tested and demonstrated on two different lunar construction scenarios.
comment: 12 pages, 16 figures
From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving
Ensuring robust and generalizable autonomous driving requires not only broad scenario coverage but also efficient repair of failure cases, particularly those related to challenging and safety-critical scenarios. However, existing scenario generation and selection methods often lack adaptivity and semantic relevance, limiting their impact on performance improvement. In this paper, we propose \textbf{SERA}, an LLM-powered framework that enables autonomous driving systems to self-evolve by repairing failure cases through targeted scenario recommendation. By analyzing performance logs, SERA identifies failure patterns and dynamically retrieves semantically aligned scenarios from a structured bank. An LLM-based reflection mechanism further refines these recommendations to maximize relevance and diversity. The selected scenarios are used for few-shot fine-tuning, enabling targeted adaptation with minimal data. Experiments on the benchmark show that SERA consistently improves key metrics across multiple autonomous driving baselines, demonstrating its effectiveness and generalizability under safety-critical conditions.
Soft Electrothermal Meta-Actuator for Robust Multifunctional Control
Soft electrothermal actuators are of great interest in diverse application domains for their simplicity, compliance, and ease of control. However, the very nature of thermally induced mechanical actuation sets inherent operation constraints: unidirectional motion, environmental sensitivity, and slow response times limited by passive cooling. To overcome these constraints, we propose a meta-actuator architecture, which uses engineered heat transfer in thin films to achieve multifunctional operation. We demonstrate electrically selectable bidirectional motion with large deflection ($ \geq $28% of actuator length at 0.75 W), suppressed thermal sensitivity to ambient temperature changes when compared to conventional actuators (>100$ \times $ lower), and actively forced return to the rest state, which is 10 times faster than that with passive cooling. We further show that our meta-actuator approach enables extended ranges of motions for manipulating complex objects. Versatile soft gripper operations highlight the meta-actuator's potential for soft robotics and devices.
comment: 23 pages, 5 figures
Learning Compositional Behaviors from Demonstration and Language
We introduce Behavior from Language and Demonstration (BLADE), a framework for long-horizon robotic manipulation by integrating imitation learning and model-based planning. BLADE leverages language-annotated demonstrations, extracts abstract action knowledge from large language models (LLMs), and constructs a library of structured, high-level action representations. These representations include preconditions and effects grounded in visual perception for each high-level action, along with corresponding controllers implemented as neural network-based policies. BLADE can recover such structured representations automatically, without manually labeled states or symbolic definitions. BLADE shows significant capabilities in generalizing to novel situations, including novel initial states, external state perturbations, and novel goals. We validate the effectiveness of our approach both in simulation and on real robots with a diverse set of objects with articulated parts, partial observability, and geometric constraints.
comment: Presented at CoRL 2024 and as an Oral Presentation at the 2024 CoRL LEAP Workshop. The first two authors contributed equally. The last two authors jointly advised the project. For videos and additional results, visit: https://blade-bot.github.io/
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation
Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines RAG-VLM and Policy-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3D, MP3D, and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot autonomous navigation without requiring prior map building or pre-training.
Enhanced SIRRT*: A Structure-Aware RRT* for 2D Path Planning with Hybrid Smoothing and Bidirectional Rewiring
Sampling-based motion planners such as Rapidly-exploring Random Tree* (RRT*) and its informed variant IRRT* are widely used for optimal path planning in complex environments. However, these methods often suffer from slow convergence and high variance due to their reliance on random sampling, particularly when initial solution discovery is delayed. This paper presents Enhanced SIRRT* (E-SIRRT*), a structure-aware planner that improves upon the original SIRRT* framework by introducing two key enhancements: hybrid path smoothing and bidirectional rewiring. Hybrid path smoothing refines the initial path through spline fitting and collision-aware correction, while bidirectional rewiring locally optimizes tree connectivity around the smoothed path to improve cost propagation. Experimental results demonstrate that E-SIRRT* consistently outperforms IRRT* and SIRRT* in terms of initial path quality, convergence rate, and robustness across 100 trials. Unlike IRRT*, which exhibits high variability due to stochastic initialization, E-SIRRT* achieves repeatable and efficient performance through deterministic skeleton-based initialization and structural refinement.
Mastering Agile Tasks with Limited Trials
Embodied robots nowadays can already handle many real-world manipulation tasks. However, certain other real-world tasks (e.g., shooting a basketball into a hoop) are highly agile and require high execution precision, presenting additional challenges for methods primarily designed for quasi-static manipulation tasks. This leads to increased efforts in costly data collection, laborious reward design, or complex motion planning. Such tasks, however, are far less challenging for humans. Say a novice basketball player typically needs only $\sim$10 attempts to make their first successful shot, by roughly imitating a motion prior and then iteratively adjusting their motion based on the past outcomes. Inspired by this human learning paradigm, we propose the Adaptive Diffusion Action Plannin (ADAP) algorithm, a simple & scalable approach which iteratively refines its action plan by few real-world trials within a learned prior motion pattern, until reaching a specific goal. Experiments demonstrated that ADAP can learn and accomplish a wide range of goal-conditioned agile dynamic tasks with human-level precision and efficiency directly in real-world, such as throwing a basketball into the hoop in fewer than 10 trials. Project website:https://adap-robotics.github.io/ .
Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge
Vision-language-action (VLA) models have emerged as the next generation of models in robotics. However, despite leveraging powerful pre-trained Vision-Language Models (VLMs), existing end-to-end VLA systems often lose key capabilities during fine-tuning as the model adapts to specific robotic tasks. We argue that a generalizable VLA model should retain and expand upon the VLM's core competencies: 1) Open-world embodied reasoning - the VLA should inherit the knowledge from VLM, i.e., recognize anything that the VLM can recognize, capable of solving math problems, possessing visual-spatial intelligence, 2) Reasoning following - effectively translating the open-world reasoning into actionable steps for the robot. In this work, we introduce ChatVLA-2, a novel mixture-of-expert VLA model coupled with a specialized three-stage training pipeline designed to preserve the VLM's original strengths while enabling actionable reasoning. To validate our approach, we design a math-matching task wherein a robot interprets math problems written on a whiteboard and picks corresponding number cards from a table to solve equations. Remarkably, our method exhibits exceptional mathematical reasoning and OCR capabilities, despite these abilities not being explicitly trained within the VLA. Furthermore, we demonstrate that the VLA possesses strong spatial reasoning skills, enabling it to interpret novel directional instructions involving previously unseen objects. Overall, our method showcases reasoning and comprehension abilities that significantly surpass state-of-the-art imitation learning methods such as OpenVLA, DexVLA, and pi-zero. This work represents a substantial advancement toward developing truly generalizable robotic foundation models endowed with robust reasoning capacities.
comment: Project page: https://chatvla-2.github.io/
DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation
We present DexUMI - a data collection and policy learning framework that uses the human hand as the natural interface to transfer dexterous manipulation skills to various robot hands. DexUMI includes hardware and software adaptations to minimize the embodiment gap between the human hand and various robot hands. The hardware adaptation bridges the kinematics gap using a wearable hand exoskeleton. It allows direct haptic feedback in manipulation data collection and adapts human motion to feasible robot hand motion. The software adaptation bridges the visual gap by replacing the human hand in video data with high-fidelity robot hand inpainting. We demonstrate DexUMI's capabilities through comprehensive real-world experiments on two different dexterous robot hand hardware platforms, achieving an average task success rate of 86%.
A Provable Approach for End-to-End Safe Reinforcement Learning
A longstanding goal in safe reinforcement learning (RL) is a method to ensure the safety of a policy throughout the entire process, from learning to operation. However, existing safe RL paradigms inherently struggle to achieve this objective. We propose a method, called Provably Lifetime Safe RL (PLS), that integrates offline safe RL with safe policy deployment to address this challenge. Our proposed method learns a policy offline using return-conditioned supervised learning and then deploys the resulting policy while cautiously optimizing a limited set of parameters, known as target returns, using Gaussian processes (GPs). Theoretically, we justify the use of GPs by analyzing the mathematical relationship between target and actual returns. We then prove that PLS finds near-optimal target returns while guaranteeing safety with high probability. Empirically, we demonstrate that PLS outperforms baselines both in safety and reward performance, thereby achieving the longstanding goal to obtain high rewards while ensuring the safety of a policy throughout the lifetime from learning to operation.
comment: 27 pages
Streaming Flow Policy: Simplifying diffusion$/$flow-matching policies by treating action trajectories as flow trajectories ICRA 2025
Recent advances in diffusion$/$flow-matching policies have enabled imitation learning of complex, multi-modal action trajectories. However, they are computationally expensive because they sample a trajectory of trajectories: a diffusion$/$flow trajectory of action trajectories. They discard intermediate action trajectories, and must wait for the sampling process to complete before any actions can be executed on the robot. We simplify diffusion$/$flow policies by treating action trajectories as flow trajectories. Instead of starting from pure noise, our algorithm samples from a narrow Gaussian around the last action. Then, it incrementally integrates a velocity field learned via flow matching to produce a sequence of actions that constitute a single trajectory. This enables actions to be streamed to the robot on-the-fly during the flow sampling process, and is well-suited for receding horizon policy execution. Despite streaming, our method retains the ability to model multi-modal behavior. We train flows that stabilize around demonstration trajectories to reduce distribution shift and improve imitation learning performance. Streaming flow policy outperforms prior methods while enabling faster policy execution and tighter sensorimotor loops for learning-based robot control. Project website: https://streaming-flow-policy.github.io/
comment: ICRA 2025 Beyond Pick and Place Workshop
Spring-Brake! Handed Shearing Auxetics Improve Efficiency of Hopping and Standing
Energy efficiency is critical to the success of legged robotics. Efficiency is lost through wasted energy during locomotion and standing. Including elastic elements has been shown to reduce movement costs, while including breaks can reduce standing costs. However, adding separate elements for each increases the mass and complexity of a leg, reducing overall system performance. Here we present a novel compliant mechanism using a Handed Shearing Auxetic (HSA) that acts as a spring and break in a monopod hopping robot. The HSA acts as a parallel elastic actuator, reducing electrical power for dynamic hopping and matching the efficiency of state-of-the-art compliant hoppers. The HSA\u2019s auxetic behavior enables dual functionality. During static tasks, it locks under large forces with minimal input power by blocking deformation, creating high friction similar to a capstan mechanism. This allows the leg to support heavy loads without motor torque, addressing thermal inefficiency. The multi-functional design enhances both dynamic and static performance, offering a versatile solution for robotic applications.
comment: This work has been submitted to the IEEE for possible publication
TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Dynamic Objects
Real-time tracking of previously unseen, highly dynamic objects in contact-rich environments -- such as during dexterous in-hand manipulation -- remains a significant challenge. Purely vision-based tracking often suffers from heavy occlusions due to the frequent contact interactions and motion blur caused by abrupt motion during contact impacts. We propose TwinTrack, a physics-aware visual tracking framework that enables robust and real-time 6-DoF pose tracking of unknown dynamic objects in a contact-rich scene by leveraging the contact physics of the observed scene. At the core of TwinTrack is an integration of Real2Sim and Sim2Real. In Real2Sim, we combine the complementary strengths of vision and contact physics to estimate object's collision geometry and physical properties: object's geometry is first reconstructed from vision, then updated along with other physical parameters from contact dynamics for physical accuracy. In Sim2Real, robust pose estimation of the object is achieved by adaptive fusion between visual tracking and prediction of the learned contact physics. TwinTrack is built on a GPU-accelerated, deeply customized physics engine to ensure real-time performance. We evaluate our method on two contact-rich scenarios: object falling with rich contact impacts against the environment, and contact-rich in-hand manipulation. Experimental results demonstrate that, compared to baseline methods, TwinTrack achieves significantly more robust, accurate, and real-time 6-DoF tracking in these challenging scenarios, with tracking speed exceeding 20 Hz. Project page: https://irislab.tech/TwinTrack-webpage/
Semantic Exploration and Dense Mapping of Complex Environments using Ground Robots Equipped with LiDAR and Panoramic Camera
This paper presents a system for autonomous semantic exploration and dense semantic target mapping of a complex unknown environment using a ground robot equipped with a LiDAR-panoramic camera suite. Existing approaches often struggle to balance collecting high-quality observations from multiple view angles and avoiding unnecessary repetitive traversal. To fill this gap, we propose a complete system combining mapping and planning. We first redefine the task as completing both geometric coverage and semantic viewpoint observation. We then manage semantic and geometric viewpoints separately and propose a novel Priority-driven Decoupled Local Sampler to generate local viewpoint sets. This enables explicit multi-view semantic inspection and voxel coverage without unnecessary repetition. Building on this, we develop a hierarchical planner to ensure efficient global coverage. In addition, we propose a Safe Aggressive Exploration State Machine, which allows aggressive exploration behavior while ensuring the robot's safety. Our system includes a plug-and-play semantic target mapping module that integrates seamlessly with state-of-the-art SLAM algorithms for pointcloud-level dense semantic target mapping. We validate our approach through extensive experiments in both realistic simulations and complex real-world environments. Simulation results show that our planner achieves faster exploration and shorter travel distances while guaranteeing a specified number of multi-view inspections. Real-world experiments further confirm the system's effectiveness in achieving accurate dense semantic object mapping of unstructured environments.
Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation ICRA 2025
In order to navigate safely and reliably in off-road and unstructured environments, robots must detect anomalies that are out-of-distribution (OOD) with respect to the training data. We present an analysis-by-synthesis approach for pixel-wise anomaly detection without making any assumptions about the nature of OOD data. Given an input image, we use a generative diffusion model to synthesize an edited image that removes anomalies while keeping the remaining image unchanged. Then, we formulate anomaly detection as analyzing which image segments were modified by the diffusion model. We propose a novel inference approach for guided diffusion by analyzing the ideal guidance gradient and deriving a principled approximation that bootstraps the diffusion model to predict guidance gradients. Our editing technique is purely test-time that can be integrated into existing workflows without the need for retraining or fine-tuning. Finally, we use a combination of vision-language foundation models to compare pixels in a learned feature space and detect semantically meaningful edits, enabling accurate anomaly detection for off-road navigation. Project website: https://siddancha.github.io/anomalies-by-diffusion-synthesis/
comment: Presented at ICRA 2025
Dynamic Task Adaptation for Multi-Robot Manufacturing Systems with Large Language Models
Recent manufacturing systems are increasingly adopting multi-robot collaboration to handle complex and dynamic environments. While multi-agent architectures support decentralized coordination among robot agents, they often face challenges in enabling real-time adaptability for unexpected disruptions without predefined rules. Recent advances in large language models offer new opportunities for context-aware decision-making to enable adaptive responses to unexpected changes. This paper presents an initial exploratory implementation of a large language model-enabled control framework for dynamic task reassignment in multi-robot manufacturing systems. A central controller agent leverages the large language model's ability to interpret structured robot configuration data and generate valid reassignments in response to robot failures. Experiments in a real-world setup demonstrate high task success rates in recovering from failures, highlighting the potential of this approach to improve adaptability in multi-robot manufacturing systems.
Enhancing Lifelong Multi-Agent Path-finding by Using Artificial Potential Fields
We explore the use of Artificial Potential Fields (APFs) to solve Multi-Agent Path Finding (MAPF) and Lifelong MAPF (LMAPF) problems. In MAPF, a team of agents must move to their goal locations without collisions, whereas in LMAPF, new goals are generated upon arrival. We propose methods for incorporating APFs in a range of MAPF algorithms, including Prioritized Planning, MAPF-LNS2, and Priority Inheritance with Backtracking (PIBT). Experimental results show that using APF is not beneficial for MAPF but yields up to a 7-fold increase in overall system throughput for LMAPF.
Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts
While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, visuospatial cognition - reasoning about spatial layouts, relations, and dynamics - remains a significant challenge. Existing models often lack the necessary architectural components and specialized training data for fine-grained spatial understanding. We introduce ViCA2 (Visuospatial Cognitive Assistant 2), a novel MLLM designed to enhance spatial reasoning. ViCA2 features a dual vision encoder architecture integrating SigLIP for semantics and Hiera for spatial structure, coupled with a token ratio control mechanism for efficiency. We also developed ViCA-322K, a new large-scale dataset with over 322,000 spatially grounded question-answer pairs for targeted instruction tuning. On the challenging VSI-Bench benchmark, our ViCA2-7B model achieves a state-of-the-art average score of 56.8, significantly surpassing larger open-source models (e.g., LLaVA-NeXT-Video-72B, 40.9) and leading proprietary models (Gemini-1.5 Pro, 45.4). This demonstrates the effectiveness of our approach in achieving strong visuospatial intelligence with a compact model. We release ViCA2, its codebase, and the ViCA-322K dataset to facilitate further research.
comment: In version 1, Hidetoshi Shimodaira was included as a co-author without their consent and has been removed from the author list
Visuospatial Cognitive Assistant
Video-based spatial cognition is vital for robotics and embodied AI but challenges current Vision-Language Models (VLMs). This paper makes two key contributions. First, we introduce ViCA (Visuospatial Cognitive Assistant)-322K, a diverse dataset of 322,003 QA pairs from real-world indoor videos (ARKitScenes, ScanNet, ScanNet++), offering supervision for 3D metadata-grounded queries and video-based complex reasoning. Second, we develop ViCA-7B, fine-tuned on ViCA-322K, which achieves new state-of-the-art on all eight VSI-Bench tasks, outperforming existing models, including larger ones (e.g., +26.1 on Absolute Distance). For interpretability, we present ViCA-Thinking-2.68K, a dataset with explicit reasoning chains, and fine-tune ViCA-7B to create ViCA-7B-Thinking, a model that articulates its spatial reasoning. Our work highlights the importance of targeted data and suggests paths for improved temporal-spatial modeling. We release all resources to foster research in robust visuospatial intelligence.
comment: Author list corrected. In version 1, Hidetoshi Shimodaira was included as a co-author without their consent and has been removed from the author list
GET: Goal-directed Exploration and Targeting for Large-Scale Unknown Environments
Object search in large-scale, unstructured environments remains a fundamental challenge in robotics, particularly in dynamic or expansive settings such as outdoor autonomous exploration. This task requires robust spatial reasoning and the ability to leverage prior experiences. While Large Language Models (LLMs) offer strong semantic capabilities, their application in embodied contexts is limited by a grounding gap in spatial reasoning and insufficient mechanisms for memory integration and decision consistency.To address these challenges, we propose GET (Goal-directed Exploration and Targeting), a framework that enhances object search by combining LLM-based reasoning with experience-guided exploration. At its core is DoUT (Diagram of Unified Thought), a reasoning module that facilitates real-time decision-making through a role-based feedback loop, integrating task-specific criteria and external memory. For repeated tasks, GET maintains a probabilistic task map based on a Gaussian Mixture Model, allowing for continual updates to object-location priors as environments evolve.Experiments conducted in real-world, large-scale environments demonstrate that GET improves search efficiency and robustness across multiple LLMs and task settings, significantly outperforming heuristic and LLM-only baselines. These results suggest that structured LLM integration provides a scalable and generalizable approach to embodied decision-making in complex environments.
ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Vision-based tactile sensing has been widely used in perception, reconstruction, and robotic manipulation. However, collecting large-scale tactile data remains costly due to the localized nature of sensor-object interactions and inconsistencies across sensor instances. Existing approaches to scaling tactile data, such as simulation and free-form tactile generation, often suffer from unrealistic output and poor transferability to downstream tasks. To address this, we propose ControlTac, a two-stage controllable framework that generates realistic tactile images conditioned on a single reference tactile image, contact force, and contact position. With those physical priors as control input, ControlTac generates physically plausible and varied tactile images that can be used for effective data augmentation. Through experiments on three downstream tasks, we demonstrate that ControlTac can effectively augment tactile datasets and lead to consistent gains. Our three real-world experiments further validate the practical utility of our approach. Project page: https://dongyuluo.github.io/controltac.
comment: 22 pages, 11 figures, 7 tables
HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval
We hand the community HAND, a simple and time-efficient method for teaching robots new manipulation tasks through human hand demonstrations. Instead of relying on task-specific robot demonstrations collected via teleoperation, HAND uses easy-to-provide hand demonstrations to retrieve relevant behaviors from task-agnostic robot play data. Using a visual tracking pipeline, HAND extracts the motion of the human hand from the hand demonstration and retrieves robot sub-trajectories in two stages: first filtering by visual similarity, then retrieving trajectories with similar behaviors to the hand. Fine-tuning a policy on the retrieved data enables real-time learning of tasks in under four minutes, without requiring calibrated cameras or detailed hand pose estimation. Experiments also show that HAND outperforms retrieval baselines by over 2x in average task success rates on real robots. Videos can be found at our project website: https://liralab.usc.edu/handretrieval/.
JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning
Recent advances in model-based reinforcement learning (MBRL) have achieved super-human level performance on the Atari100k benchmark, driven by reinforcement learning agents trained on powerful diffusion world models. However, we identify that the current aggregates mask a major performance asymmetry: MBRL agents dramatically outperform humans in some tasks despite drastically underperforming in others, with the former inflating the aggregate metrics. This is especially pronounced in pixel-based agents trained with diffusion world models. In this work, we address the pronounced asymmetry observed in pixel-based agents as an initial attempt to reverse the worrying upward trend observed in them. We address the problematic aggregates by delineating all tasks as Agent-Optimal or Human-Optimal and advocate for equal importance on metrics from both sets. Next, we hypothesize this pronounced asymmetry is due to the lack of temporally-structured latent space trained with the World Model objective in pixel-based methods. Lastly, to address this issue, we propose Joint Embedding DIffusion (JEDI), a novel latent diffusion world model trained end-to-end with the self-consistency objective. JEDI outperforms SOTA models in human-optimal tasks while staying competitive across the Atari100k benchmark, and runs 3 times faster with 43% lower memory than the latest pixel-based diffusion baseline. Overall, our work rethinks what it truly means to cross human-level performance in Atari100k.
comment: Preprint
Robust Localization, Mapping, and Navigation for Quadruped Robots
Quadruped robots are currently a widespread platform for robotics research, thanks to powerful Reinforcement Learning controllers and the availability of cheap and robust commercial platforms. However, to broaden the adoption of the technology in the real world, we require robust navigation stacks relying only on low-cost sensors such as depth cameras. This paper presents a first step towards a robust localization, mapping, and navigation system for low-cost quadruped robots. In pursuit of this objective we combine contact-aided kinematic, visual-inertial odometry, and depth-stabilized vision, enhancing stability and accuracy of the system. Our results in simulation and two different real-world quadruped platforms show that our system can generate an accurate 2D map of the environment, robustly localize itself, and navigate autonomously. Furthermore, we present in-depth ablation studies of the important components of the system and their impact on localization accuracy. Videos, code, and additional experiments can be found on the project website: https://sites.google.com/view/low-cost-quadruped-slam
comment: 8 Pages
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Vision-language-action (VLA) models have shown promise as generalist robotic policies by jointly leveraging visual, linguistic, and proprioceptive modalities to generate action trajectories. While recent benchmarks have advanced VLA research in domestic tasks, professional science-oriented domains remain underexplored. We introduce AutoBio, a simulation framework and benchmark designed to evaluate robotic automation in biology laboratory environments--an application domain that combines structured protocols with demanding precision and multimodal interaction. AutoBio extends existing simulation capabilities through a pipeline for digitizing real-world laboratory instruments, specialized physics plugins for mechanisms ubiquitous in laboratory workflows, and a rendering stack that support dynamic instrument interfaces and transparent materials through physically based rendering. Our benchmark comprises biologically grounded tasks spanning three difficulty levels, enabling standardized evaluation of language-guided robotic manipulation in experimental protocols. We provide infrastructure for demonstration generation and seamless integration with VLA models. Baseline evaluations with two SOTA VLA models reveal significant gaps in precision manipulation, visual reasoning, and instruction following in scientific workflows. By releasing AutoBio, we aim to catalyze research on generalist robotic systems for complex, high-precision, and multimodal professional environments. The simulator and benchmark are publicly available to facilitate reproducible research.
Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks
In this work, we focus on unsupervised vision-language-action mapping in the area of robotic manipulation. Recently, multiple approaches employing pre-trained large language and vision models have been proposed for this task. However, they are computationally demanding and require careful fine-tuning of the produced outputs. A more lightweight alternative would be the implementation of multimodal Variational Autoencoders (VAEs) which can extract the latent features of the data and integrate them into a joint representation, as has been demonstrated mostly on image-image or image-text data for the state-of-the-art models. Here we explore whether and how can multimodal VAEs be employed in unsupervised robotic manipulation tasks in a simulated environment. Based on the obtained results, we propose a model-invariant training alternative that improves the models' performance in a simulator by up to 55%. Moreover, we systematically evaluate the challenges raised by the individual tasks such as object or robot position variability, number of distractors or the task length. Our work thus also sheds light on the potential benefits and limitations of using the current multimodal VAEs for unsupervised learning of robotic motion trajectories based on vision and language.
comment: 7 pages, 5 figures, 2 tables, conference
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Contact-rich manipulation plays an important role in daily human activities. However, uncertain physical parameters often pose significant challenges for both planning and control. A promising strategy is to develop policies that are robust across a wide range of parameters. Domain adaptation and domain randomization are widely used, but they tend to either limit generalization to new instances or perform conservatively due to neglecting instance-specific information. \textit{Explicit motor adaptation} addresses these issues by estimating system parameters online and then retrieving the parameter-conditioned policy from a parameter-augmented base policy. However, it typically requires precise system identification or additional training of a student policy, both of which are challenging in contact-rich manipulation tasks with diverse physical parameters. In this work, we propose \textit{implicit motor adaptation}, which enables parameter-conditioned policy retrieval given a roughly estimated parameter distribution instead of a single estimate. We leverage tensor train as an implicit representation of the base policy, facilitating efficient retrieval of the parameter-conditioned policy by exploiting the separable structure of tensor cores. This framework eliminates the need for precise system estimation and policy retraining while preserving optimal behavior and strong generalization. We provide a theoretical analysis to validate the approach, supported by numerical evaluations on three contact-rich manipulation primitives. Both simulation and real-world experiments demonstrate its ability to generate robust policies across diverse instances. Project website: \href{https://sites.google.com/view/implicit-ma}{https://sites.google.com/view/implicit-ma}.
Adaptive Distance Functions via Kelvin Transformation ICRA 2025
The term safety in robotics is often understood as a synonym for avoidance. Although this perspective has led to progress in path planning and reactive control, a generalization of this perspective is necessary to include task semantics relevant to contact-rich manipulation tasks, especially during teleoperation and to ensure the safety of learned policies. We introduce the semantics-aware distance function and a corresponding computational method based on the Kelvin Transformation. This allows us to compute smooth distance approximations in an unbounded domain by instead solving a Laplace equation in a bounded domain. The semantics-aware distance generalizes signed distance functions by allowing the zero level set to lie inside of the object in regions where contact is allowed, effectively incorporating task semantics, such as object affordances, in an adaptive implicit representation of safe sets. In numerical experiments we show the computational viability of our method for real applications and visualize the computed function on a wrench with various semantic regions.
comment: Accepted to IEEE ICRA 2025
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and researchers often lack access to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving towards semi-fullscale validation with R/V milliAmpere1 and full-scale validation with R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC), including autonomy.
Sample Efficient Robot Learning in Supervised Effect Prediction Tasks
In self-supervised robotic learning, agents acquire data through active interaction with their environment, incurring costs such as energy use, human oversight, and experimental time. To mitigate these, sample-efficient exploration is essential. While intrinsic motivation (IM) methods like learning progress (LP) are widely used in robotics, and active learning (AL) is well established for classification in machine learning, few frameworks address continuous, high-dimensional regression tasks typical of world model learning. We propose MUSEL (Model Uncertainty for Sample-Efficient Learning), a novel AL framework tailored for regression tasks in robotics, such as action-effect prediction. MUSEL introduces a model uncertainty metric that combines total predictive uncertainty, learning progress, and input diversity to guide data acquisition. We validate our approach using a Stochastic Variational Deep Kernel Learning (SVDKL) model in two robotic tabletop tasks. Experimental results demonstrate that MUSEL improves both learning accuracy and sample efficiency, validating its effectiveness in learning action effects and selecting informative samples.
comment: 15 pages, 6 figures
A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems ICML 2025
As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems.
comment: 22 Pages, 12 Figures. First two authors have contributed equally. Accepted at ICML 2025
NanoSLAM: Enabling Fully Onboard SLAM for Tiny Robots
Perceiving and mapping the surroundings are essential for enabling autonomous navigation in any robotic platform. The algorithm class that enables accurate mapping while correcting the odometry errors present in most robotics systems is Simultaneous Localization and Mapping (SLAM). Today, fully onboard mapping is only achievable on robotic platforms that can host high-wattage processors, mainly due to the significant computational load and memory demands required for executing SLAM algorithms. For this reason, pocket-size hardware-constrained robots offload the execution of SLAM to external infrastructures. To address the challenge of enabling SLAM algorithms on resource-constrained processors, this paper proposes NanoSLAM, a lightweight and optimized end-to-end SLAM approach specifically designed to operate on centimeter-size robots at a power budget of only 87.9 mW. We demonstrate the mapping capabilities in real-world scenarios and deploy NanoSLAM on a nano-drone weighing 44 g and equipped with a novel commercial RISC-V low-power parallel processor called GAP9. The algorithm is designed to leverage the parallel capabilities of the RISC-V processing cores and enables mapping of a general environment with an accuracy of 4.5 cm and an end-to-end execution time of less than 250 ms.
comment: 24 pages
A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension
Intraoperative hypotension (IOH) prediction using past physiological signals is crucial, as IOH may lead to inadequate organ perfusion and significantly elevate the risk of severe complications and mortality. However, current methods often rely on static modeling, overlooking the complex temporal dependencies and the inherently non-stationary nature of physiological signals. We propose a Hybrid Multi-Factor (HMF) network that formulates IOH prediction as a dynamic sequence forecasting task, explicitly capturing both temporal dependencies and physiological non-stationarity. We represent signal dynamics as multivariate time series and decompose them into trend and seasonal components, enabling separate modeling of long-term and periodic variations. Each component is encoded with a patch-based Transformer to balance computational efficiency and feature representation. To address distributional drift from evolving signals, we introduce a symmetric normalization mechanism. Experiments on both public and real-world clinical datasets show that HMF significantly outperforms competitive baselines. We hope HMF offers new insights into IOH prediction and ultimately promotes safer surgical care. Our code is available at https://github.com/Mingyue-Cheng/HMF.
Exploring Remote Collaborative Tasks: The Impact of Avatar Representation on Dyadic Haptic Interactions in Shared Virtual Environments
This study is the first to explore the interplay between haptic interaction and avatar representation in Shared Virtual Environments (SVEs). Specifically, how these factors shape users' sense of social presence during dyadic collaborations, while assessing potential effects on task performance. In a series of experiments, participants performed the collaborative task with haptic interaction under four avatar representation conditions: avatars of both participant and partner were displayed, only the participant's avatar was displayed, only the partner's avatar was displayed, and no avatars were displayed. The study finds that avatar representation, especially of the partner, significantly enhances the perception of social presence, which haptic interaction alone does not fully achieve. However, neither the presence nor the type of avatar representation impacts the task performance or participants' force effort of the task, suggesting that haptic interaction provides sufficient interaction cues for the execution of the task. These results underscore the significance of integrating both visual and haptic modalities to optimize remote collaboration experiences in virtual environments, ensuring effective communication and a strong sense of social presence.
Latent Weight Diffusion: Generating reactive policies instead of trajectories
With the increasing availability of open-source robotic data, imitation learning has emerged as a viable approach for both robot manipulation and locomotion. Currently, large generalized policies are trained to predict controls or trajectories using diffusion models, which have the desirable property of learning multimodal action distributions. However, generalizability comes with a cost, namely, larger model size and slower inference. This is especially an issue for robotic tasks that require high control frequency. Further, there is a known trade-off between performance and action horizon for Diffusion Policy (DP), a popular model for generating trajectories: fewer diffusion queries accumulate greater trajectory tracking errors. For these reasons, it is common practice to run these models at high inference frequency, subject to robot computational constraints. To address these limitations, we propose Latent Weight Diffusion (LWD), a method that uses diffusion to generate closed-loop policies (weights for neural policies) for robotic tasks, rather than generating trajectories. Learning the behavior distribution through parameter space over trajectory space offers two key advantages: longer action horizons (fewer diffusion queries) & robustness to perturbations while retaining high performance; and a lower inference compute cost. To this end, we show that LWD has higher success rates than DP when the action horizon is longer and when stochastic perturbations exist in the environment. Furthermore, LWD achieves multitask performance comparable to DP while requiring just ~1/45th of the inference-time FLOPS
Adapting Gait Frequency for Posture-regulating Humanoid Push-recovery via Hierarchical Model Predictive Control ICRA 2025
Current humanoid push-recovery strategies often use whole-body motion, yet they tend to overlook posture regulation. For instance, in manipulation tasks, the upper body may need to stay upright and have minimal recovery displacement. This paper introduces a novel approach to enhancing humanoid push-recovery performance under unknown disturbances and regulating body posture by tailoring the recovery stepping strategy. We propose a hierarchical-MPC-based scheme that analyzes and detects instability in the prediction window and quickly recovers through adapting gait frequency. Our approach integrates a high-level nonlinear MPC, a posture-aware gait frequency adaptation planner, and a low-level convex locomotion MPC. The planners predict the center of mass (CoM) state trajectories that can be assessed for precursors of potential instability and posture deviation. In simulation, we demonstrate improved maximum recoverable impulse by 131% on average compared with baseline approaches. In hardware experiments, a 125 ms advancement in recovery stepping timing/reflex has been observed with the proposed approach. We also demonstrate improved push-recovery performance and minimized body attitude change under 0.2 rad.
comment: 7 pages, 6 figures, accepted to ICRA 2025
Gait-Net-augmented Implicit Kino-dynamic MPC for Dynamic Variable-frequency Humanoid Locomotion over Discrete Terrains
Reduced-order-model-based optimal control techniques for humanoid locomotion struggle to adapt step duration and placement simultaneously in dynamic walking gaits due to their reliance on fixed-time discretization, which limits responsiveness to various disturbances and results in suboptimal performance in challenging conditions. In this work, we propose a Gait-Net-augmented implicit kino-dynamic model-predictive control (MPC) to simultaneously optimize step location, step duration, and contact forces for natural variable-frequency locomotion. The proposed method incorporates a Gait-Net-augmented Sequential Convex MPC algorithm to solve multi-linearly constrained variables by iterative quadratic programs. At its core, a lightweight Gait-frequency Network (Gait-Net) determines the preferred step duration in terms of variable MPC sampling times, simplifying step duration optimization to the parameter level. Additionally, it enhances and updates the spatial reference trajectory within each sequential iteration by incorporating local solutions, allowing the projection of kinematic constraints to the design of reference trajectories. We validate the proposed algorithm in high-fidelity simulations and on small-size humanoid hardware, demonstrating its capability for variable-frequency and 3-D discrete terrain locomotion with only a one-step preview of terrain data.
comment: 15 pages, 13 figures, RSS 2025 accepted
MR-ULINS: A Tightly-Coupled UWB-LiDAR-Inertial Estimator with Multi-Epoch Outlier Rejection
The LiDAR-inertial odometry (LIO) and the ultra-wideband (UWB) have been integrated together to achieve driftless positioning in global navigation satellite system (GNSS)-denied environments. However, the UWB may be affected by systematic range errors (such as the clock drift and the antenna phase center offset) and non-line-of-sight (NLOS) signals, resulting in reduced robustness. In this study, we propose a UWB-LiDAR-inertial estimator (MR-ULINS) that tightly integrates the UWB range, LiDAR frame-to-frame, and IMU measurements within the multi-state constraint Kalman filter (MSCKF) framework. The systematic range errors are precisely modeled to be estimated and compensated online. Besides, we propose a multi-epoch outlier rejection algorithm for UWB NLOS by utilizing the relative accuracy of the LIO. Specifically, the relative trajectory of the LIO is employed to verify the consistency of all range measurements within the sliding window. Extensive experiment results demonstrate that MR-ULINS achieves a positioning accuracy of around 0.1 m in complex indoor environments with severe NLOS interference. Ablation experiments show that the online estimation and multi-epoch outlier rejection can effectively improve the positioning accuracy. Besides, MR-ULINS maintains high accuracy and robustness in LiDAR-degenerated scenes and UWB-challenging conditions with spare base stations.
comment: 8 pages, 9 figures
MSC-LIO: An MSCKF-Based LiDAR-Inertial Odometry with Same-Plane Cluster Tracking
The multi-state constraint Kalman filter (MSCKF) has been proven to be more efficient than graph optimization for visual-based odometry while with similar accuracy. However, it has not been adequately considered and studied for LiDAR-based odometry. In this paper, we propose a novel tightly-coupled LiDAR-inertial odometry based on the MSCKF framework, named MSC-LIO. An efficient LiDAR same-plane cluster (LSPC) tracking method, without explicit feature extraction, is present for frame-to-frame data associations. The tracked LSPC is used to build an LSPC measurement model that constructs multi-state constraints. Besides, we propose an effective point-velocity-based LiDAR-IMU time-delay (LITD) estimation method, which is derived from the proposed LSPC tracking method. To validate the effectiveness and robustness of the proposed method, we conducted extensive experiments on both public datasets and real-world environments. The results demonstrate that the proposed MSC-LIO yields higher accuracy and efficiency compared to the state-of-the-art methods. Ablation experiments indicate that the data-association efficiency is improved by nearly 3 times with the LSPC tracking, and the proposed LITD estimation method can effectively and accurately estimate the LITD. Besides, MSC-LIO was implemented on an edge device and demonstrated excellent real-time performance.
comment: 11 pages, 12 figures, 8 tables
Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning
Predicting a sequence of actions has been crucial in the success of recent behavior cloning algorithms in robotics. Can similar ideas improve reinforcement learning (RL)? We answer affirmatively by observing that incorporating action sequences when predicting ground-truth return-to-go leads to lower validation loss. Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i.e., explicitly training the value function to learn the consequence of executing action sequences. Our experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks from BiGym and RLBench.
comment: 17 Pages. Website: https://younggyo.me/cqn-as/
CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios SC 2025
With the increasing demand for heterogeneous Unmanned Aerial Vehicle (UAV) swarms to perform complex tasks in urban environments, system design now faces major challenges, including efficient semantic understanding, flexible task planning, and the ability to dynamically adjust coordination strategies in response to evolving environmental conditions and continuously changing task requirements. To address the limitations of existing approaches, this paper proposes coordination field agentic system for coordinating heterogeneous UAV swarms in complex urban scenarios. In this system, large language models (LLMs) is responsible for interpreting high-level human instructions and converting them into executable commands for the UAV swarms, such as patrol and target tracking. Subsequently, a Coordination field mechanism is proposed to guide UAV motion and task selection, enabling decentralized and adaptive allocation of emergent tasks. A total of 50 rounds of comparative testing were conducted across different models in a 2D simulation space to evaluate their performance. Experimental results demonstrate that the proposed system achieves superior performance in terms of task coverage, response time, and adaptability to dynamic changes.
comment: Submitted ITSC 2025
Survival of the fastest -- algorithm-guided evolution of light-powered underwater microrobots
Depending on multiple parameters, soft robots can exhibit different modes of locomotion that are difficult to model numerically. As a result, improving their performance is complex, especially in small-scale systems characterized by low Reynolds numbers, when multiple aero- and hydrodynamical processes influence their movement. In this work, we optimize light-powered millimetre-scale underwater swimmer locomotion by applying experimental results - measured swimming speed - as the fitness function in two evolutionary algorithms: particle swarm optimization and genetic algorithm. As these soft, light-powered robots with different characteristics (phenotypes) can be fabricated quickly, they provide a great platform for optimisation experiments, using many competing robots to improve swimming speed over consecutive generations. Interestingly, just like in natural evolution, unexpected gene combinations led to surprisingly good results, including eight-fold increase in speed or the discovery of a self-oscillating underwater locomotion mode.
comment: 14 pages
GrowSplat: Constructing Temporal Digital Twins of Plants with Gaussian Splats
Accurate temporal reconstructions of plant growth are essential for plant phenotyping and breeding, yet remain challenging due to complex geometries, occlusions, and non-rigid deformations of plants. We present a novel framework for building temporal digital twins of plants by combining 3D Gaussian Splatting with a robust sample alignment pipeline. Our method begins by reconstructing Gaussian Splats from multi-view camera data, then leverages a two-stage registration approach: coarse alignment through feature-based matching and Fast Global Registration, followed by fine alignment with Iterative Closest Point. This pipeline yields a consistent 4D model of plant development in discrete time steps. We evaluate the approach on data from the Netherlands Plant Eco-phenotyping Center, demonstrating detailed temporal reconstructions of Sequoia and Quinoa species. Videos and Images can be seen at https://berkeleyautomation.github.io/GrowSplat/
Stochastic Adaptive Estimation in Polynomial Curvature Shape State Space for Continuum Robots
In continuum robotics, real-time robust shape estimation is crucial for planning and control tasks that involve physical manipulation in complex environments. In this paper, we present a novel stochastic observer-based shape estimation framework designed specifically for continuum robots. The shape state space is uniquely represented by the modal coefficients of a polynomial, enabled by leveraging polynomial curvature kinematics (PCK) to describe the curvature distribution along the arclength. Our framework processes noisy measurements from limited discrete position, orientation, or pose sensors to estimate the shape state robustly. We derive a novel noise-weighted observability matrix, providing a detailed assessment of observability variations under diverse sensor configurations. To overcome the limitations of a single model, our observer employs the Interacting Multiple Model (IMM) method, coupled with Extended Kalman Filters (EKFs), to mix polynomial curvature models of different orders. The IMM approach, rooted in Markov processes, effectively manages multiple model scenarios by dynamically adapting to different polynomial orders based on real-time model probabilities. This adaptability is key to ensuring robust shape estimation of the robot's behaviors under various conditions. Our comprehensive analysis, supported by both simulation studies and experimental validations, confirms the robustness and accuracy of our methods.
comment: 20 pages, submitted to IEEE Transactions on Robotics, under review
Integrated Shape and Force Sensing for Continuum Instruments via Polynomial Curvature Modeling and Torque-Cell-Based Actuation
Continuum instruments are integral to robot-assisted minimally invasive surgery (MIS), with cable-driven mechanisms being the most common. Real-time tension feedback is crucial for precise articulation but remains a challenge in compact actuation unit designs. Additionally, accurate shape and external force sensing of continuum instruments are essential for advanced control and manipulation. This paper presents a compact actuation unit that integrates a torque cell directly into the pulley module to provide real-time tension feedback. Building on this unit, we propose a novel shape-force sensing framework incorporating polynomial curvature kinematics and virtual-work-based force sensing model. The framework combines pose sensor measurements at the instrument tip and actuation tension feedback at the developed actuation unit. Experimental results demonstrate the improved performance of the framework in terms of shape reconstruction accuracy and force estimation reliability compared to conventional constant curvature methods.
Reevaluation of Large Neighborhood Search for MAPF: Findings and Opportunities
Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability, leading to a surge of methods, especially those leveraging machine learning, to enhance neighborhood selection. However, several pitfalls exist and hinder a comprehensive evaluation of these new methods, which mainly include: 1) Lower than actual or incorrect baseline performance; 2) Lack of a unified evaluation setting and criterion; 3) Lack of a codebase or executable model for supervised learning methods. To address these challenges, we introduce a unified evaluation framework, implement prior methods, and conduct an extensive comparison of prominent methods. Our evaluation reveals that rule-based heuristics serve as strong baselines, while current learning-based methods show no clear advantage on time efficiency or improvement capacity. Our extensive analysis also opens up new research opportunities for improving MAPF-LNS, such as targeting high-delayed agents, applying contextual algorithms, optimizing replan order and neighborhood size, where machine learning can potentially be integrated. Code and data are available at https://github.com/ChristinaTan0704/mapf-lns-unified.
comment: International Symposium on Combinatorial Search (SoCS) 2025
Multiagent Systems
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym ICAPS 2025
In recent years, reinforcement learning (RL) methods have been widely tested using tools like OpenAI Gym, though many tasks in these environments could also benefit from hierarchical planning. However, there is a lack of a tool that enables seamless integration of hierarchical planning with RL. Hierarchical Domain Definition Language (HDDL), used in classical planning, introduces a structured approach well-suited for model-based RL to address this gap. To bridge this integration, we introduce HDDLGym, a Python-based tool that automatically generates OpenAI Gym environments from HDDL domains and problems. HDDLGym serves as a link between RL and hierarchical planning, supporting multi-agent scenarios and enabling collaborative planning among agents. This paper provides an overview of HDDLGym's design and implementation, highlighting the challenges and design choices involved in integrating HDDL with the Gym interface, and applying RL policies to support hierarchical planning. We also provide detailed instructions and demonstrations for using the HDDLGym framework, including how to work with existing HDDL domains and problems from International Planning Competitions, exemplified by the Transport domain. Additionally, we offer guidance on creating new HDDL domains for multi-agent scenarios and demonstrate the practical use of HDDLGym in the Overcooked domain. By leveraging the advantages of HDDL and Gym, HDDLGym aims to be a valuable tool for studying RL in hierarchical planning, particularly in multi-agent contexts.
comment: Accepted to Proceedings of ICAPS 2025
From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation
While embodied agents have made significant progress in performing complex physical tasks, real-world applications demand more than pure task execution. The agents must collaborate with unfamiliar agents and human users, whose goals are often vague and implicit. In such settings, interpreting ambiguous instructions and uncovering underlying desires is essential for effective assistance. Therefore, fast and accurate desire alignment becomes a critical capability for embodied agents. In this work, we first develop a home assistance simulation environment HA-Desire that integrates an LLM-driven human user agent exhibiting realistic value-driven goal selection and communication. The ego agent must interact with this proxy user to infer and adapt to the user's latent desires. To achieve this, we present a novel framework FAMER for fast desire alignment, which introduces a desire-based mental reasoning mechanism to identify user intent and filter desire-irrelevant actions. We further design a reflection-based communication module that reduces redundant inquiries, and incorporate goal-relevant information extraction with memory persistence to improve information reuse and reduce unnecessary exploration. Extensive experiments demonstrate that our framework significantly enhances both task execution and communication efficiency, enabling embodied agents to quickly adapt to user-specific desires in complex embodied environments.
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence. Nevertheless, the question of how agents should be structurally organized for optimal cooperation remains largely unexplored. In this position paper, we aim to gently redirect the focus of the MAS research community toward this critical dimension: develop topology-aware MASs for specific tasks. Specifically, the system consists of three core components - agents, communication links, and communication patterns - that collectively shape its coordination performance and efficiency. To this end, we introduce a systematic, three-stage framework: agent selection, structure profiling, and topology synthesis. Each stage would trigger new research opportunities in areas such as language models, reinforcement learning, graph learning, and generative modeling; together, they could unleash the full potential of MASs in complicated real-world applications. Then, we discuss the potential challenges and opportunities in the evaluation of multiple systems. We hope our perspective and framework can offer critical new insights in the era of agentic AI.
Voice CMS: updating the knowledge base of a digital assistant through conversation
In this study, we propose a solution based on a multi-agent LLM architecture and a voice user interface (VUI) designed to update the knowledge base of a digital assistant. Its usability is evaluated in comparison to a more traditional graphical content management system (CMS), with a focus on understanding the relationship between user preferences and the complexity of the information being provided. The findings demonstrate that, while the overall usability of the VUI is rated lower than the graphical interface, it is already preferred by users for less complex tasks. Furthermore, the quality of content entered through the VUI is comparable to that achieved with the graphical interface, even for highly complex tasks. Obtained qualitative results suggest that a hybrid interface combining the strengths of both approaches could address the key challenges identified during the experiment, such as reducing cognitive load through graphical feedback while maintaining the intuitive nature of voice-based interactions. This work highlights the potential of conversational interfaces as a viable and effective method for knowledge management in specific business contexts.
Efficient Leave-one-out Approximation in LLM Multi-agent Debate Based on Introspection
Multi-agent systems based on large language models (LLMs) advance automatic task completion in various fields, where debate is a common cooperation form for agents to solve complicated problems with reasoning and cross-review to solidify answers. Assessing the individual contributions of agents within these debates is crucial for system refinement and outcome reliability. Traditional leave-one-out (LOO) method offers a clear framework for evaluating each agent's role but face challenges in LLM-based systems due to high computational costs and associated financial implications. This paper presents introspective-leave-one-out (IntrospecLOO), a simple yet effective prompting for approximation of LOO in LLM-powered multi-agent debates. IntrospecLOO introduces an additional querying round after standard debates, prompting agents to update their answers while ignoring responses from a designated agent. This strategy effectively isolates and gauges each participant's influence at a reduced query complexity compared to the original LOO approaches. Validation through experiments on three benchmark datasets confirms the effectiveness of IntrospecLOO.
Online Fair Division for Personalized $2$-Value Instances
We study an online fair division setting, where goods arrive one at a time and there is a fixed set of $n$ agents, each of whom has an additive valuation function over the goods. Once a good appears, the value each agent has for it is revealed and it must be allocated immediately and irrevocably to one of the agents. It is known that without any assumptions about the values being severely restricted or coming from a distribution, very strong impossibility results hold in this setting. To bypass the latter, we turn our attention to instances where the valuation functions are restricted. In particular, we study personalized $2$-value instances, where there are only two possible values each agent may have for each good, possibly different across agents, and we show how to obtain worst case guarantees with respect to well-known fairness notions, such as maximin share fairness and envy-freeness up to one (or two) good(s). We suggest a deterministic algorithm that maintains a $1/(2n-1)$-MMS allocation at every time step and show that this is the best possible any deterministic algorithm can achieve if one cares about every single time step; nevertheless, eventually the allocation constructed by our algorithm becomes a $1/4$-MMS allocation. To achieve this, the algorithm implicitly maintains a fragile system of priority levels for all agents. Further, we show that, by allowing some limited access to future information, it is possible to have stronger results with less involved approaches. By knowing the values of goods for $n-1$ time steps into the future, we design a matching-based algorithm that achieves an EF$1$ allocation every $n$ time steps, while always maintaining an EF$2$ allocation. Finally, we show that our results allow us to get the first nontrivial guarantees for additive instances in which the ratio of the maximum over the minimum value an agent has for a good is bounded.
Sentiment Simulation using Generative AI Agents
Traditional sentiment analysis relies on surface-level linguistic patterns and retrospective data, limiting its ability to capture the psychological and contextual drivers of human sentiment. These limitations constrain its effectiveness in applications that require predictive insight, such as policy testing, narrative framing, and behavioral forecasting. We present a robust framework for sentiment simulation using generative AI agents embedded with psychologically rich profiles. Agents are instantiated from a nationally representative survey of 2,485 Filipino respondents, combining sociodemographic information with validated constructs of personality traits, values, beliefs, and socio-political attitudes. The framework includes three stages: (1) agent embodiment via categorical or contextualized encodings, (2) exposure to real-world political and economic scenarios, and (3) generation of sentiment ratings accompanied by explanatory rationales. Using Quadratic Weighted Accuracy (QWA), we evaluated alignment between agent-generated and human responses. Contextualized encoding achieved 92% alignment in replicating original survey responses. In sentiment simulation tasks, agents reached 81%--86% accuracy against ground truth sentiment, with contextualized profile encodings significantly outperforming categorical (p < 0.0001, Cohen's d = 0.70). Simulation results remained consistent across repeated trials (+/-0.2--0.5% SD) and resilient to variation in scenario framing (p = 0.9676, Cohen's d = 0.02). Our findings establish a scalable framework for sentiment modeling through psychographically grounded AI agents. This work signals a paradigm shift in sentiment analysis from retrospective classification to prospective and dynamic simulation grounded in psychology of sentiment formation.
comment: 18 pages, 10 figures
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
Multimodality-to-Multiaudio (MM2MA) generation faces significant challenges in synthesizing diverse and contextually aligned audio types (e.g., sound effects, speech, music, and songs) from multimodal inputs (e.g., video, text, images), owing to the scarcity of high-quality paired datasets and the lack of robust multi-task learning frameworks. Recently, multi-agent system shows great potential in tackling the above issues. However, directly applying it to MM2MA task presents three critical challenges: (1) inadequate fine-grained understanding of multimodal inputs (especially for video), (2) the inability of single models to handle diverse audio events, and (3) the absence of self-correction mechanisms for reliable outputs. To this end, we propose AudioGenie, a novel training-free multi-agent system featuring a dual-layer architecture with a generation team and a supervisor team. For the generation team, a fine-grained task decomposition and an adaptive Mixture-of-Experts (MoE) collaborative entity are designed for dynamic model selection, and a trial-and-error iterative refinement module is designed for self-correction. The supervisor team ensures temporal-spatial consistency and verifies outputs through feedback loops. Moreover, we build MA-Bench, the first benchmark for MM2MA tasks, comprising 198 annotated videos with multi-type audios. Experiments demonstrate that our AudioGenie outperforms state-of-the-art (SOTA) methods across 9 metrics in 8 tasks. User study further validate the effectiveness of the proposed method in terms of quality, accuracy, alignment, and aesthetic. The anonymous project website with samples can be found at https://audiogenie.github.io/.
Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning
In multi-agent reinforcement learning (MARL), effective communication improves agent performance, particularly under partial observability. We propose MARL-CPC, a framework that enables communication among fully decentralized, independent agents without parameter sharing. MARL-CPC incorporates a message learning model based on collective predictive coding (CPC) from emergent communication research. Unlike conventional methods that treat messages as part of the action space and assume cooperation, MARL-CPC links messages to state inference, supporting communication in non-cooperative, reward-independent settings. We introduce two algorithms -Bandit-CPC and IPPO-CPC- and evaluate them in non-cooperative MARL tasks. Benchmarks show that both outperform standard message-as-action approaches, establishing effective communication even when messages offer no direct benefit to the sender. These results highlight MARL-CPC's potential for enabling coordination in complex, decentralized environments.
Properties of zero-determinant strategies in multichannel games
Controlling payoffs in repeated games is one of the important topics in control theory of multi-agent systems. Recently proposed zero-determinant strategies enable players to unilaterally enforce linear relations between payoffs. Furthermore, based on the mathematics of zero-determinant strategies, regional payoff control, in which payoffs are enforced into some feasible regions, has been discovered in social dilemma situations. More recently, theory of payoff control was extended to multichannel games, where players parallelly interact with each other in multiple channels. However, properties of zero-determinant strategies specific to multichannel games are still not clear. In this paper, we elucidate properties of zero-determinant strategies in multichannel games. First, we relate the existence condition of zero-determinant strategies in multichannel games to that of zero-determinant strategies in each channel. We then show that the existence of zero-determinant strategies in multichannel games requires the existence of zero-determinant strategies in some channels. This result implies that the existence of zero-determinant strategies in multichannel games is tightly restricted by structure of games played in each channel.
comment: 12 pages
Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development
Recent advancements in Large Language Models (LLMs) and autonomous agents have demonstrated remarkable capabilities across various domains. However, standalone agents frequently encounter limitations when handling complex tasks that demand extensive interactions and substantial computational resources. Although Multi-Agent Systems (MAS) alleviate some of these limitations through collaborative mechanisms like task decomposition, iterative communication, and role specialization, they typically remain resource-unaware, incurring significant inefficiencies due to high token consumption and excessive execution time. To address these limitations, we propose a resource-aware multi-agent system -- Co-Saving (meaning that multiple agents collaboratively engage in resource-saving activities), which leverages experiential knowledge to enhance operational efficiency and solution quality. Our key innovation is the introduction of "shortcuts" -- instructional transitions learned from historically successful trajectories -- which allows to bypass redundant reasoning agents and expedite the collective problem-solving process. Experiments for software development tasks demonstrate significant advantages over existing methods. Specifically, compared to the state-of-the-art MAS ChatDev, our method achieves an average reduction of 50.85% in token usage, and improves the overall code quality by 10.06%.
comment: Work in Progress
Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation
This study presents an innovative approach to urban mobility simulation by integrating a Large Language Model (LLM) with Agent-Based Modeling (ABM). Unlike traditional rule-based ABM, the proposed framework leverages LLM to enhance agent diversity and realism by generating synthetic population profiles, allocating routine and occasional locations, and simulating personalized routes. Using real-world data, the simulation models individual behaviors and large-scale mobility patterns in Taipei City. Key insights, such as route heat maps and mode-specific indicators, provide urban planners with actionable information for policy-making. Future work focuses on establishing robust validation frameworks to ensure accuracy and reliability in urban planning applications.
comment: 8 pages, 8 figures. This paper is reviewed and accepted by the CUPUM (Computational Urban Planning and Urban Management) Conference held by University College London (UCL) in 2025
A Large Language Model-Enabled Control Architecture for Dynamic Resource Capability Exploration in Multi-Agent Manufacturing Systems
Manufacturing environments are becoming more complex and unpredictable due to factors such as demand variations and shorter product lifespans. This complexity requires real-time decision-making and adaptation to disruptions. Traditional control approaches highlight the need for advanced control strategies capable of overcoming unforeseen challenges, as they demonstrate limitations in responsiveness within dynamic industrial settings. Multi-agent systems address these challenges through decentralization of decision-making, enabling systems to respond dynamically to operational changes. However, current multi-agent systems encounter challenges related to real-time adaptation, context-aware decision-making, and the dynamic exploration of resource capabilities. Large language models provide the possibility to overcome these limitations through context-aware decision-making capabilities. This paper introduces a large language model-enabled control architecture for multi-agent manufacturing systems to dynamically explore resource capabilities in response to real-time disruptions. A simulation-based case study demonstrates that the proposed architecture improves system resilience and flexibility. The case study findings show improved throughput and efficient resource utilization compared to existing approaches.
Dynamic Task Adaptation for Multi-Robot Manufacturing Systems with Large Language Models
Recent manufacturing systems are increasingly adopting multi-robot collaboration to handle complex and dynamic environments. While multi-agent architectures support decentralized coordination among robot agents, they often face challenges in enabling real-time adaptability for unexpected disruptions without predefined rules. Recent advances in large language models offer new opportunities for context-aware decision-making to enable adaptive responses to unexpected changes. This paper presents an initial exploratory implementation of a large language model-enabled control framework for dynamic task reassignment in multi-robot manufacturing systems. A central controller agent leverages the large language model's ability to interpret structured robot configuration data and generate valid reassignments in response to robot failures. Experiments in a real-world setup demonstrate high task success rates in recovering from failures, highlighting the potential of this approach to improve adaptability in multi-robot manufacturing systems.
Enhancing Lifelong Multi-Agent Path-finding by Using Artificial Potential Fields
We explore the use of Artificial Potential Fields (APFs) to solve Multi-Agent Path Finding (MAPF) and Lifelong MAPF (LMAPF) problems. In MAPF, a team of agents must move to their goal locations without collisions, whereas in LMAPF, new goals are generated upon arrival. We propose methods for incorporating APFs in a range of MAPF algorithms, including Prioritized Planning, MAPF-LNS2, and Priority Inheritance with Backtracking (PIBT). Experimental results show that using APF is not beneficial for MAPF but yields up to a 7-fold increase in overall system throughput for LMAPF.
Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations
To fully leverage the potential of artificial intelligence (AI) systems in a trustworthy manner, it is desirable to couple multiple AI and non-AI systems together seamlessly for constraining and ensuring correctness of the output. This paper introduces a novel parallel discrete event simulation (PDES) based methodology to combine multiple AI and non-AI agents in a causal, rule-based way. Our approach tightly integrates the concept of passage of time, with each agent considered as an entity in the PDES framework and responding to prior requests from other agents. Such coupling mechanism enables the agents to work in a co-operative environment towards a common goal while many tasks run in parallel throughout the simulation. It further enables setting up boundaries to the outputs of the AI agents by applying necessary dynamic constraints using non-AI agents while allowing for scalability through deployment of hundreds of such agents in a larger compute cluster. Distributing smaller AI agents can enable extremely scalable simulations in the future, addressing local memory bottlenecks for model parameter storage. Within a PDES involving both AI and non-AI agents, we break down the problem at hand into structured steps, when necessary, providing a set of multiple choices to the AI agents, and then progressively solve these steps towards a final goal. At each step, the non-AI agents act as unbiased auditors, verifying each action by the AI agents so that certain rules of engagement are followed. We evaluate our approach by solving four problems from four different domains and comparing the results with those from AI models alone. Our results show greater accuracy in solving problems from various domains where the AI models struggle to solve the problems solely by themselves. Results show that overall accuracy of our approach is 68% where as the accuracy of vanilla models is less than 23%.
Using LLMs to Advance the Cognitive Science of Collectives
LLMs are already transforming the study of individual cognition, but their application to studying collective cognition has been underexplored. We lay out how LLMs may be able to address the complexity that has hindered the study of collectives and raise possible risks that warrant new methods.
Behavioral alignment in social networks
The orderly behaviors observed in large-scale groups, such as fish schooling and the organized movement of crowds, are both ubiquitous and essential for the survival and stability of these systems. Such complex collective behaviors often emerge from simple local interactions and strategy adjustments among individuals. Understanding how these basic rules shape complex group dynamics has long been a significant scientific challenge. Historically, research has predominantly focused on imitation and social learning, where individuals adopt the strategies of more successful peers to refine their behavior. However, in recent years, an alternative learning approach, self-exploration and introspective learning, has garnered increasing attention. In this paradigm, individuals assess their own circumstances and select strategies that best align with their specific conditions. Two primary forms of this learning are coordination and anti-coordination, where individuals align with and diverge from the local majority, respectively. In this study, we analyze networked systems of coordinating and anti-coordinating individuals, exploring the combined effects of system dynamics, network structure, and behavioral patterns. We address several practical questions, including the number of equilibria, their characteristics, the equilibrium time, and the resilience of systems. We find that the number of equilibrium states can be extremely large, even increasing exponentially with minor alternations to the network structure. Moreover, the network structure has a significant impact on the average equilibrium time. Despite the complexity of these findings, variations can be captured by a single, simple network characteristic: the average path length. Our research offers valuable insights into how modifications to the interaction structure can influence behavioral alignment in social networks.
Benchmarking LLMs' Swarm intelligence
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks (Pursuit, Synchronization, Foraging, Flocking, Transport) within a configurable 2D grid environment, forcing agents to rely solely on local sensory input ($k\times k$ view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Zero-shot evaluations of leading LLMs (e.g., deepseek-v3, o4-mini) reveal significant task-dependent performance variations. While some rudimentary coordination is observed, our results indicate that current LLMs significantly struggle with robust long-range planning and adaptive strategy formation under the uncertainty inherent in these decentralized scenarios. Assessing LLMs under such swarm-like constraints is crucial for understanding their utility in future decentralized intelligent systems. We release SwarmBench as an open, extensible toolkit-built on a customizable physical system-providing environments, prompts, evaluation scripts, and comprehensive datasets. This aims to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of emergent collective behavior under severe informational decentralization. Our code repository is available at https://github.com/x66ccff/swarmbench.
comment: added new ref
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement ACL 2025
In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments. Code is available at https://github.com/zjunlp/SynWorld.
comment: ACL 2025 Findings
OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization
Hyperparameter optimization (HPO) is a critical yet challenging aspect of machine learning model development, significantly impacting model performance and generalization. Traditional HPO methods often struggle with high dimensionality, complex interdependencies, and computational expense. This paper introduces OptiMindTune, a novel multi-agent framework designed to intelligently and efficiently optimize hyperparameters. OptiMindTune leverages the collaborative intelligence of three specialized AI agents -- a Recommender Agent, an Evaluator Agent, and a Decision Agent -- each powered by Google's Gemini models. These agents address distinct facets of the HPO problem, from model selection and hyperparameter suggestion to robust evaluation and strategic decision-making. By fostering dynamic interactions and knowledge sharing, OptiMindTune aims to converge to optimal hyperparameter configurations more rapidly and robustly than existing single-agent or monolithic approaches. Our framework integrates principles from advanced large language models, and adaptive search to achieve scalable and intelligent AutoML. We posit that this multi-agent paradigm offers a promising avenue for tackling the increasing complexity of modern machine learning model tuning.
comment: 7 pages, 2 tables
The Complexity of Pure Strategy Relevant Equilibria in Concurrent Games
We study rational synthesis problems for concurrent games with $\omega$-regular objectives. Our model of rationality considers only pure strategy Nash equilibria that satisfy either a social welfare or Pareto optimality condition with respect to an $\omega$-regular objective for each agent. This extends earlier work on equilibria in concurrent games, without consideration about their quality. Our results show that the existence of Nash equilibria satisfying social welfare conditions can be computed as efficiently as the constrained Nash equilibrium existence problem. On the other hand, the existence of Nash equilibria satisfying the Pareto optimality condition possibly involves a higher upper bound, except in the case of B\"uchi and Muller games, for which all three problems are in the classes P and PSPACE-complete, respectively.
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration ACL 2025
Agents built on large language models (LLMs) have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent's System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent's System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. DPT-Agent can effectively help LLMs convert correct slow thinking and reasoning into executable actions, thereby improving performance. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously. Code of DPT-Agent can be found in https://github.com/sjtu-marl/DPT-Agent.
comment: Accepted by ACL 2025 Main. Camera Ready Version
Preference-CFR$\:$ Beyond Nash Equilibrium for Better Game Strategies
Artificial intelligence (AI) has surpassed top human players in a variety of games. In imperfect information games, these achievements have primarily been driven by Counterfactual Regret Minimization (CFR) and its variants for computing Nash equilibrium. However, most existing research has focused on maximizing payoff, while largely neglecting the importance of strategic diversity and the need for varied play styles, thereby limiting AI's adaptability to different user preferences. To address this gap, we propose Preference-CFR (Pref-CFR), a novel method that incorporates two key parameters: preference degree and vulnerability degree. These parameters enable the AI to adjust its strategic distribution within an acceptable performance loss threshold, thereby enhancing its adaptability to a wider range of strategic demands. In our experiments with Texas Hold'em, Pref-CFR successfully trained Aggressive and Loose Passive styles that not only match original CFR-based strategies in performance but also display clearly distinct behavioral patterns. Notably, for certain hand scenarios, Pref-CFR produces strategies that diverge significantly from both conventional expert heuristics and original CFR outputs, potentially offering novel insights for professional players.
A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control
Traditional Identity and Access Management (IAM) systems, primarily designed for human users or static machine identities via protocols such as OAuth, OpenID Connect (OIDC), and SAML, prove fundamentally inadequate for the dynamic, interdependent, and often ephemeral nature of AI agents operating at scale within Multi Agent Systems (MAS), a computational system composed of multiple interacting intelligent agents that work collectively. This paper posits the imperative for a novel Agentic AI IAM framework: We deconstruct the limitations of existing protocols when applied to MAS, illustrating with concrete examples why their coarse-grained controls, single-entity focus, and lack of context-awareness falter. We then propose a comprehensive framework built upon rich, verifiable Agent Identities (IDs), leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), that encapsulate an agents capabilities, provenance, behavioral scope, and security posture. Our framework includes an Agent Naming Service (ANS) for secure and capability-aware discovery, dynamic fine-grained access control mechanisms, and critically, a unified global session management and policy enforcement layer for real-time control and consistent revocation across heterogeneous agent communication protocols. We also explore how Zero-Knowledge Proofs (ZKPs) enable privacy-preserving attribute disclosure and verifiable policy compliance. We outline the architecture, operational lifecycle, innovative contributions, and security considerations of this new IAM paradigm, aiming to establish the foundational trust, accountability, and security necessary for the burgeoning field of agentic AI and the complex ecosystems they will inhabit.
comment: 24 Pages, 5 figures, 2 tables, edit: added a missed Author
Systems and Control (CS)
Data-Driven Control of Continuous-Time LTI Systems via Non-Minimal Realizations
This article proposes an approach to design output-feedback controllers for unknown continuous-time linear time-invariant systems using only input-output data from a single experiment. To address the lack of state and derivative measurements, we introduce non-minimal realizations whose states can be observed by filtering the available data. We first apply this concept to the disturbance-free case, formulating linear matrix inequalities (LMIs) from batches of sampled signals to design a dynamic, filter-based stabilizing controller. The framework is then extended to the problem of asymptotic tracking and disturbance rejection - in short, output regulation - by incorporating an internal model based on prior knowledge of the disturbance/reference frequencies. Finally, we discuss tuning strategies for a class of multi-input multi-output systems and illustrate the method via numerical examples.
Learning to Pursue AC Optimal Power Flow Solutions with Feasibility Guarantees
This paper focuses on an AC optimal power flow (OPF) problem for distribution feeders equipped with controllable distributed energy resources (DERs). We consider a solution method that is based on a continuous approximation of the projected gradient flow - referred to as the safe gradient flow - that incorporates voltage and current information obtained either through real-time measurements or power flow computations. These two setups enable both online and offline implementations. The safe gradient flow involves the solution of convex quadratic programs (QPs). To enhance computational efficiency, we propose a novel framework that employs a neural network approximation of the optimal solution map of the QP. The resulting method has two key features: (a) it ensures that the DERs' setpoints are practically feasible, even for an online implementation or when an offline algorithm has an early termination; (b) it ensures convergence to a neighborhood of a strict local optimizer of the AC OPF. The proposed method is tested on a 93-node distribution system with realistic loads and renewable generation. The test shows that our method successfully regulates voltages within limits during periods with high renewable generation.
Current trends and future directions in event-based control
The defining characteristic of event-based control is that feedback loops are only closed when indicated by a triggering condition that takes recent information about the system into account. This stands in contrast to periodic control where the feedback loop is closed periodically. Benefits of event-based control arise when sampling comes at a cost, which occurs, e.g., for Networked Control Systems or in other setups with resource constraints. A rapidly growing number of publications deals with event-based control. Nevertheless, some fundamental questions about event-based control are still unsolved. In this article, we provide an overview of current research trends in event-based control. We focus on results that aim for a better understanding of effects that occur in feedback loops with event-based control. Based on this summary, we identify important open directions for future research.
comment: Preprint submitted to the European Journal of Control
Operator-Splitting Methods for Neuromorphic Circuit Simulation
A novel splitting algorithm is proposed for the numerical simulation of neuromorphic circuits. The algorithm is grounded in the operator-theoretic concept of monotonicity, which bears both physical and algorithmic significance. The splitting exploits this correspondence to translate the circuit architecture into the algorithmic architecture. The paper illustrates the many advantages of the proposed operator-theoretic framework over conventional numerical integration for the simulation of multiscale hierarchical events that characterize neuromorphic behaviors.
State and Input Constrained Adaptive Tracking Control of Uncertain Euler-Lagrange Systems with Robustness and Feasibility Analysis
This paper proposes an adaptive tracking controller for uncertain Euler-Lagrange (E-L) systems with user-defined state and input constraints in presence of bounded external disturbances. A barrier Lyapunov function (BLF) is employed for state constraint satisfaction, integrated with a saturated controller that ensures the control input remains within pre-specified bounds. To the best of the authors' knowledge, this is the first result on tracking control of state and input-constrained uncertain E-L systems that provides verifiable conditions for the existence of a feasible control policy. The efficacy of the proposed controller in terms of constraint satisfaction and tracking performance is demonstrated through simulation on a robotic manipulator system.
comment: arXiv admin note: text overlap with arXiv:2212.08495
State Constrained Model Reference Adaptive Control with Input Amplitude and Rate Limits
This paper proposes a robust model reference adaptive controller (MRAC) for uncertain multi-input multi-output (MIMO) linear time-invariant (LTI) plants with user-defined constraints on the plant states, input amplitude, and input rate. The proposed two-layer barrier Lyapunov function (BLF)-based control design considers the input and the input rate as states that are constrained using two BLFs in the first layer, while another BLF in the second layer constrains the plant states. The adaptive control law ensures that the plant states, input amplitude, and input rate remain within the user-defined safe sets despite unmatched bounded disturbances. Sufficient conditions for the existence of a feasible control policy are also provided. To the best of the authors' knowledge, this is the first optimization-free method that imposes user-defined constraints on the state, input, and input rate and also provides verifiable feasibility conditions in the presence of parametric uncertainties and disturbances. Simulation results demonstrate the effectiveness of the proposed algorithm.
A Multi-output Gaussian Process Regression with Negative Transfer Mitigation for Generating Boundary Test Scenarios of Multi-UAV Systems
Adaptive sampling based on Gaussian process regression (GPR) has already been applied with considerable success to generate boundary test scenarios for multi-UAV systems (MUS). One of the key techniques in such researches is leveraging the accurate prediction of the MUS performance through GPR in different test scenarios. Due to the potential correlations among the multiple MUS performance metrics, current researches commonly utilize a multi-output GPR (MOGPR) to model the multiple performance metrics simultaneously. This approach can achieve a more accurate prediction, rather than modeling each metric individually. However, MOGPR still suffers from negative transfer. When the feature of one output variable is incorrectly learned by another, the models training process will be negatively affected, leading to a decline in prediction performance. To solve this problem, this paper proposes a novel adaptive regularization approach into the conventional MOGPR training process. Unlike existing regularization approaches for mitigating negative transfer in MOGPR, our method penalizes the inconsistencies among output-specific characteristic parameters using adaptively adjustable regularization weights. This mechanism helps each set of output parameters avoid local optima. Consequently, it yields simultaneous improvements in predictive accuracy across all outputs. Finally, we validate our approach on a numerical case and on a boundary test scenario generation case for a MUS multi-objectives search task.
comment: Article,29 pages,18 figures
On data usage and predictive behavior of data-driven predictive control with 1-norm regularization
We investigate the data usage and predictive behavior of data-driven predictive control (DPC) with 1-norm regularization. Our analysis enables the offline removal of unused data and facilitates a comparison between the identified symmetric structure and data usage against prior knowledge of the true system. This comparison helps assess the suitability of the DPC scheme for effective control.
comment: This paper is a preprint of a contribution to the IEEE Control Systems Letters. 6 pages, 3 figures
A memristive model of spatio-temporal excitability
This paper introduces a model of excitability that unifies the mechanism of an important neuronal property both in time and in space. As a starting point, we revisit both a key model of temporal excitability, proposed by Hodgkin and Huxley, and a key model of spatial excitability, proposed by Amari. We then propose a novel model that captures the temporal and spatial properties of both models. Our aim is to regard neuronal excitability as a property across scales, and to explore the benefits of modeling excitability with one and the same mechanism, whether at the cellular or the population level.
comment: 7 pages, 9 figures, submitted for CDC 2025
Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance
In this paper, we propose a novel dynamic state-feedback controller for polytopic linear parameter-varying (LPV) systems with constant input matrix. The controller employs a projected gradient flow method to continuously improve its control law and, under established conditions, converges to the optimal feedback gain of the corresponding linear quadratic regulator (LQR) problem associated with constant parameter trajectories. We derive conditions for quadratic stability, which can be verified via convex optimization, to ensure exponential stability of the LPV system even under arbitrarily fast parameter variations. Additionally, we provide sufficient conditions to guarantee the boundedness of the trajectories of the dynamic controller for any parameter trajectory and the convergence of its feedback gains to the optimal LQR gains for constant parameter trajectories. Furthermore, we show that the closed-loop system is asymptotically stable for constant parameter trajectories under these conditions. Simulation results demonstrate that the controller maintains stability and improves transient performance.
comment: submitted to Transactions on Automatic Control
Efficient Dynamic Shielding for Parametric Safety Specifications
Shielding has emerged as a promising approach for ensuring safety of AI-controlled autonomous systems. The algorithmic goal is to compute a shield, which is a runtime safety enforcement tool that needs to monitor and intervene the AI controller's actions if safety could be compromised otherwise. Traditional shields are designed statically for a specific safety requirement. Therefore, if the safety requirement changes at runtime due to changing operating conditions, the shield needs to be recomputed from scratch, causing delays that could be fatal. We introduce dynamic shields for parametric safety specifications, which are succinctly represented sets of all possible safety specifications that may be encountered at runtime. Our dynamic shields are statically designed for a given safety parameter set, and are able to dynamically adapt as the true safety specification (permissible by the parameters) is revealed at runtime. The main algorithmic novelty lies in the dynamic adaptation procedure, which is a simple and fast algorithm that utilizes known features of standard safety shields, like maximal permissiveness. We report experimental results for a robot navigation problem in unknown territories, where the safety specification evolves as new obstacles are discovered at runtime. In our experiments, the dynamic shields took a few minutes for their offline design, and took between a fraction of a second and a few seconds for online adaptation at each step, whereas the brute-force online recomputation approach was up to 5 times slower.
Physical Reduced Stochastic Equations for Continuously Monitored Non-Markovian Quantum Systems with a Markovian Embedding
An effective approach to modeling non-Markovian quantum systems is to embed a principal (quantum) system of interest into a larger quantum system. A widely employed embedding is one that uses another quantum system, referred to as the auxiliary system, which is coupled to the principal system, and both the principal and auxiliary can be coupled to quantum white noise processes. The principal and auxiliary together form a quantum Markov system and the quantum white noises act as a bath (environment) for this system. Recently it was shown that the conditional evolution of the principal system in this embedding under continuous monitoring by a travelling quantum probe can be expressed as a system of coupled stochastic differential equations (SDEs) that involve only operators of the principal system. The reduced conditional state of the principal only (conditioned on the measurement outcomes) are determined by the "diagonal" blocks of this coupled systems of SDEs. It is shown here that the "off-diagonal" blocks can be exactly eliminated up to their initial conditions, leaving a reduced closed system of SDEs for the diagonal blocks only. Under additional conditions the off-diagonal initial conditions can be made to vanish. This new closed system of equations, which includes an integration term involving a two-time stochastic kernel, represents the non-Markovian stochastic dynamics of the principal system under continuous-measurement. The system of equations determine the reduced conditional state of the principal only and may be viewed as a stochastic Nakajima-Zwanzig type of equation for continuously monitored non-Markovian quantum systems.
comment: 7 pages, no figures. Accepted for publication in IEEE Control Systems Letters (http://ieee-cssletters.dei.unipd.it/)
Properties of zero-determinant strategies in multichannel games
Controlling payoffs in repeated games is one of the important topics in control theory of multi-agent systems. Recently proposed zero-determinant strategies enable players to unilaterally enforce linear relations between payoffs. Furthermore, based on the mathematics of zero-determinant strategies, regional payoff control, in which payoffs are enforced into some feasible regions, has been discovered in social dilemma situations. More recently, theory of payoff control was extended to multichannel games, where players parallelly interact with each other in multiple channels. However, properties of zero-determinant strategies specific to multichannel games are still not clear. In this paper, we elucidate properties of zero-determinant strategies in multichannel games. First, we relate the existence condition of zero-determinant strategies in multichannel games to that of zero-determinant strategies in each channel. We then show that the existence of zero-determinant strategies in multichannel games requires the existence of zero-determinant strategies in some channels. This result implies that the existence of zero-determinant strategies in multichannel games is tightly restricted by structure of games played in each channel.
comment: 12 pages
Large Language Models for Solving Economic Dispatch Problem
This paper investigates the capability of off-the-shelf large language models (LLMs) to solve the economic dispatch (ED) problem. ED is a hard-constrained optimization problem solved on a day-ahead timescale by grid operators to minimize electricity generation costs while accounting for physical and engineering constraints. Numerous approaches have been proposed, but these typically require either mathematical formulations, face convergence issues, or depend on extensive labeled data and training time. This work implements LLMs enhanced with reasoning capabilities to address the classic lossless ED problem. The proposed approach avoids the need for explicit mathematical formulations, does not suffer from convergence challenges, and requires neither labeled data nor extensive training. A few-shot learning technique is utilized in two different prompting contexts. The IEEE 118-bus system with 19 generation units serves as the evaluation benchmark. Results demonstrate that various prompting strategies enable LLMs to effectively solve the ED problem, offering a convenient and efficient alternative. Consequently, this approach presents a promising future solution for ED tasks, particularly when foundational power system models are available.
comment: 5 pages, 3 figures, Accepted, 2025 IEEE Energy Conversion Conference and Expo (ECCE 2025), Philadelphia, PA
Online distributed optimization for spatio-temporally constrained real-time peer-to-peer energy trading
The proliferation of distributed renewable energy triggers the peer-to-peer (P2P) energy market formations. To make profits, prosumers equipped with photovoltaic (PV) panels and even the energy storage system (ESS) can actively participate in the real-time P2P energy market and trade energy. However, in real situations, system states such as energy demands and renewable energy power generation are highly uncertain, making it difficult for prosumers to make optimal real-time decisions. Moreover, severe problems with the physical network can arise from the real-time P2P energy trading, such as bus voltage violations and line overload. To handle these problems, this work first formulates the real-time P2P energy trading problem as a spatio-temporally constrained stochastic optimization problem by considering ESS and the spatial physical network constraints. To deal with the uncertainties online, a modified Lyapunov optimization method is innovatively proposed to approximately reformulate the stochastic optimization problem into an online one by relaxing the time-coupling constraints. Compared with the state-of-the-art online methods, the proposed one renders more flexibility and better performance for the real-time P2P energy market operation. Additionally, to protect the prosumers' privacy, an online distributed algorithm based on the consensus alternating direction method of multipliers (ADMM) is developed to solve the reformulated online problem by decoupling the spatial constraints. The theoretical near-optimal performance guarantee of the proposed online distributed algorithm is derived, and its performance can be further improved by minimizing the performance gap. Simulation results demonstrate that the proposed online distributed algorithm can guarantee the fast, stable, and safe long-term operation of the real-time P2P energy market.
A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem
We propose a physics-informed neural networks (PINNs) framework to solve the infinite-horizon optimal control problem of nonlinear systems. In particular, since PINNs are generally able to solve a class of partial differential equations (PDEs), they can be employed to learn the value function of the infinite-horizon optimal control problem via solving the associated steady-state Hamilton-Jacobi-Bellman (HJB) equation. However, an issue here is that the steady-state HJB equation generally yields multiple solutions; hence if PINNs are directly employed to it, they may end up approximating a solution that is different from the optimal value function of the problem. We tackle this by instead applying PINNs to a finite-horizon variant of the steady-state HJB that has a unique solution, and which uniformly approximates the optimal value function as the horizon increases. An algorithm to verify if the chosen horizon is large enough is also given, as well as a method to extend it -- with reduced computations and robustness to approximation errors -- in case it is not. Unlike many existing methods, the proposed technique works well with non-polynomial basis functions, does not require prior knowledge of a stabilizing controller, and does not perform iterative policy evaluations. Simulations are performed, which verify and clarify theoretical findings.
comment: Accepted with minor revisions at International Journal of Robust and Nonlinear Control
Nonadaptive Output Regulation of Second-Order Nonlinear Uncertain Systems
This paper investigates the robust output regulation problem of second-order nonlinear uncertain systems with an unknown exosystem. Instead of the adaptive control approach, this paper resorts to a robust control methodology to solve the problem and thus avoid the bursting phenomenon. In particular, this paper constructs generic internal models for the steady-state state and input variables of the system. By introducing a coordinate transformation, this paper converts the robust output regulation problem into a nonadaptive stabilization problem of an augmented system composed of the second-order nonlinear uncertain system and the generic internal models. Then, we design the stabilization control law and construct a strict Lyapunov function that guarantees the robustness with respect to unmodeled disturbances. The analysis shows that the output zeroing manifold of the augmented system can be made attractive by the proposed nonadaptive control law, which solves the robust output regulation problem. Finally, we demonstrate the effectiveness of the proposed nonadaptive internal model approach by its application to the control of the Duffing system.
comment: 8 pages, 3 figures
Spring-Brake! Handed Shearing Auxetics Improve Efficiency of Hopping and Standing
Energy efficiency is critical to the success of legged robotics. Efficiency is lost through wasted energy during locomotion and standing. Including elastic elements has been shown to reduce movement costs, while including breaks can reduce standing costs. However, adding separate elements for each increases the mass and complexity of a leg, reducing overall system performance. Here we present a novel compliant mechanism using a Handed Shearing Auxetic (HSA) that acts as a spring and break in a monopod hopping robot. The HSA acts as a parallel elastic actuator, reducing electrical power for dynamic hopping and matching the efficiency of state-of-the-art compliant hoppers. The HSA\u2019s auxetic behavior enables dual functionality. During static tasks, it locks under large forces with minimal input power by blocking deformation, creating high friction similar to a capstan mechanism. This allows the leg to support heavy loads without motor torque, addressing thermal inefficiency. The multi-functional design enhances both dynamic and static performance, offering a versatile solution for robotic applications.
comment: This work has been submitted to the IEEE for possible publication
Local Stability and Region of Attraction Analysis for Neural Network Feedback Systems under Positivity Constraints
We study the local stability of nonlinear systems in the Lur'e form with static nonlinear feedback realized by feedforward neural networks (FFNNs). By leveraging positivity system constraints, we employ a localized variant of the Aizerman conjecture, which provides sufficient conditions for exponential stability of trajectories confined to a compact set. Using this foundation, we develop two distinct methods for estimating the Region of Attraction (ROA): (i) a less conservative Lyapunov-based approach that constructs invariant sublevel sets of a quadratic function satisfying a linear matrix inequality (LMI), and (ii) a novel technique for computing tight local sector bounds for FFNNs via layer-wise propagation of linear relaxations. These bounds are integrated into the localized Aizerman framework to certify local exponential stability. Numerical results demonstrate substantial improvements over existing integral quadratic constraint-based approaches in both ROA size and scalability.
comment: Submitted to 64th IEEE Conference on Decision and Control 2025 - Rio de Janeiro, Brazil
Flexure-FET-Based Receiver with Competitive Binding for Interference Mitigation in Molecular Communication
Molecular communication (MC), a biologically inspired technology, enables applications in nanonetworks and the Internet of Everything (IoE), with great potential for intra-body systems such as drug delivery, health monitoring, and disease detection. This paper extends our prior work on the Flexure-FET MC receiver by integrating a competitive binding model to enhance performance in high-interference environments, where multiple molecular species coexist in the reception space. Previous studies have largely focused on ligand concentration estimation and detection, without fully addressing the effects of inter-species competition for receptor binding. Our proposed framework captures this competition, offering a more biologically accurate model for multitarget environments. By incorporating competition dynamics, the model improves understanding of MC behavior under interference. This approach enables fine-tuning of receptor responses by adjusting ligand concentrations and receptor affinities, thereby optimizing the performance of the Flexure-FET MC receiver. Comprehensive analysis shows that accounting for competitive binding is crucial for improving reliability and accuracy in complex MC systems. Factors such as signal-to-noise ratio (SNR), symbol error probability (SEP), interferer concentration, and receptor dynamics are shown to significantly affect performance. The proposed framework highlights the need to manage these factors effectively. Results demonstrate that modeling interference through competitive binding offers a realistic system perspective and allows tuning of receiver response, enabling robust detection in environments with multiple coexisting species.
Learning-Based Robust Fixed-Time Terminal Sliding Mode Control
In this paper, we develop and analyze an integral fixed-time sliding mode control method for a scenario in which the system model is only partially known, utilizing Gaussian processes. We present two theorems on fixed-time convergence. The first theorem addresses the fully known system model, while the second considers situations where the system's drift is approximated utilizing Gaussian processes (GP) for approximating unknown dynamics. Both theorems establish the global fixed-time stability of the closed-loop system. The stability analysis is based on a straightforward quadratic Lyapunov function. Our proposed method outperforms an established adaptive fixed-time sliding mode control approach, especially when ample training data is available.
PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow
Machine Learning (ML) techniques for Optimal Power Flow (OPF) problems have recently garnered significant attention, reflecting a broader trend of leveraging ML to approximate and/or accelerate the resolution of complex optimization problems. These developments are necessitated by the increased volatility and scale in energy production for modern and future grids. However, progress in ML for OPF is hindered by the lack of standardized datasets and evaluation metrics, from generating and solving OPF instances, to training and benchmarking machine learning models. To address this challenge, this paper introduces PGLearn, a comprehensive suite of standardized datasets and evaluation tools for ML and OPF. PGLearn provides datasets that are representative of real-life operating conditions, by explicitly capturing both global and local variability in the data generation, and by, for the first time, including time series data for several large-scale systems. In addition, it supports multiple OPF formulations, including AC, DC, and second-order cone formulations. Standardized datasets are made publicly available to democratize access to this field, reduce the burden of data generation, and enable the fair comparison of various methodologies. PGLearn also includes a robust toolkit for training, evaluating, and benchmarking machine learning models for OPF, with the goal of standardizing performance evaluation across the field. By promoting open, standardized datasets and evaluation metrics, PGLearn aims at democratizing and accelerating research and innovation in machine learning applications for optimal power flow problems. Datasets are available for download at https://www.huggingface.co/PGLearn.
Split the Yield, Share the Risk: Pricing, Hedging and Fixed rates in DeFi
We present the first formal treatment of \emph{yield tokenization}, a mechanism that decomposes yield-bearing assets into principal and yield components to facilitate risk transfer and price discovery in decentralized finance (DeFi). We propose a model that characterizes yield token dynamics using stochastic differential equations. We derive a no-arbitrage pricing framework for yield tokens, enabling their use in hedging future yield volatility and managing interest rate risk in decentralized lending pools. Taking DeFi lending as our focus, we show how both borrowers and lenders can use yield tokens to achieve optimal hedging outcomes and mitigate exposure to adversarial interest rate manipulation. Furthermore, we design automated market makers (AMMs) that incorporate a menu of bonding curves to aggregate liquidity from participants with heterogeneous risk preferences. This leads to an efficient and incentive-compatible mechanism for trading yield tokens and yield futures. Building on these foundations, we propose a modular \textit{fixed-rate} lending protocol that synthesizes on-chain yield token markets and lending pools, enabling robust interest rate discovery and enhancing capital efficiency. Our work provides the theoretical underpinnings for risk management and fixed-income infrastructure in DeFi, offering practical mechanisms for stable and sustainable yield markets.
Preference Adaptive and Sequential Text-to-Image Generation ICML 2025
We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.
comment: Accepted to ICML 2025 Link to PASTA dataset: https://www.kaggle.com/datasets/googleai/pasta-data
Novelty Detection in Reinforcement Learning with World Models ICML
Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.
comment: ICML Spotlight 2025
Embedding Safety into RL: A New Take on Trust Region Methods ICML 2025
Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
comment: Accepted at ICML 2025
Robust Output Tracking for an Uncertain and Nonlinear 3D PDE-ODE System: Preventing Induced Seismicity in Underground Reservoirs
This paper presents a robust control strategy for output tracking of a nonlinear 3D PDE-ODE system, where the ODE has logistic-like dynamics. The output feedback control was developed by bounding the solution and its time derivative for both the infinite-dimensional system and the nonlinear ODE. These bounds were then leveraged to ensure the boundedness of the control coefficient and the perturbations in the error dynamics. The mathematical framework proves the controller's ability to manage two output types within the system, overcoming model uncertainties and heterogeneities, using minimal system information, and a continuous control signal. A case study addressing induced seismicity mitigation while ensuring energy production in the Groningen gas reservoir highlights the control's effectiveness. The strategy guarantees precise tracking of target seismicity rates and pressures across reservoir regions, even under parameter uncertainties. Numerical simulations validate the approach in two scenarios: gas extraction while not exceeding the intrinsic seismicity of the region and the addition of CO2 injections, achieving net-zero environmental impact.
Learning-Based MPC for Fuel Efficient Control of Autonomous Vehicles with Discrete Gear Selection
Co-optimization of both vehicle speed and gear position via model predictive control (MPC) has been shown to offer benefits for fuel-efficient autonomous driving. However, optimizing both the vehicle's continuous dynamics and discrete gear positions may be too computationally intensive for a real-time implementation. This work proposes a learning-based MPC scheme to address this issue. A policy is trained to select and fix the gear positions across the prediction horizon of the MPC controller, leaving a significantly simpler continuous optimization problem to be solved online. In simulation, the proposed approach is shown to have a significantly lower computation burden and a comparable performance, with respect to pure MPC-based co-optimization.
comment: 7 pages, 3 figures, accepted for publication in L-CSS. Code available at https://github.com/SamuelMallick/mpcrl-vehicle-gears
Extending direct data-driven predictive control towards systems with finite control sets
Although classical model predictive control with finite control sets (FCS-MPC) is quite a popular control method, particularly in the realm of power electronics systems, its direct data-driven predictive control (FCS-DPC) counterpart has received relatively limited attention. In this paper, we introduce a novel reformulation of a commonly used DPC scheme that allows for the application of a modified sphere decoding algorithm, known for its efficiency and prominence in FCS-MPC applications. We test the reformulation on a popular electrical drive example and compare the computation times of sphere decoding FCS-DPC with an enumeration-based and a MIQP method.
comment: This paper is a preprint of a contribution to the 22nd European Control Conference 2024. 7 pages, 1 figure
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and researchers often lack access to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving towards semi-fullscale validation with R/V milliAmpere1 and full-scale validation with R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC), including autonomy.
A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems ICML 2025
As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems.
comment: 22 Pages, 12 Figures. First two authors have contributed equally. Accepted at ICML 2025
UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control
Recent advances in diffusion bridge models leverage Doob's $h$-transform to establish fixed endpoints between distributions, demonstrating promising results in image translation and restoration tasks. However, these approaches frequently produce blurred or excessively smoothed image details and lack a comprehensive theoretical foundation to explain these shortcomings. To address these limitations, we propose UniDB, a unified framework for diffusion bridges based on Stochastic Optimal Control (SOC). UniDB formulates the problem through an SOC-based optimization and derives a closed-form solution for the optimal controller, thereby unifying and generalizing existing diffusion bridge models. We demonstrate that existing diffusion bridges employing Doob's $h$-transform constitute a special case of our framework, emerging when the terminal penalty coefficient in the SOC cost function tends to infinity. By incorporating a tunable terminal penalty coefficient, UniDB achieves an optimal balance between control costs and terminal penalties, substantially improving detail preservation and output quality. Notably, UniDB seamlessly integrates with existing diffusion bridge models, requiring only minimal code modifications. Extensive experiments across diverse image restoration tasks validate the superiority and adaptability of the proposed framework. Our code is available at https://github.com/UniDB-SOC/UniDB/.
Robust Data-Driven Predictive Control for Mixed Platoons under Noise and Attacks
Controlling mixed platoons, which consist of both connected and automated vehicles (CAVs) and human-driven vehicles (HDVs), poses significant challenges due to the uncertain and unknown human driving behaviors. Data-driven control methods offer promising solutions by leveraging available trajectory data, but their performance can be compromised by noise and attacks. To address this issue, this paper proposes a Robust Data-EnablEd Predictive Leading Cruise Control (RDeeP-LCC) framework based on data-driven reachability analysis. The framework over-approximates system dynamics under noise and attack using a matrix zonotope set derived from data, and develops a stabilizing feedback control law. By decoupling the mixed platoon system into nominal and error components, we employ data-driven reachability sets to recursively compute error reachable sets that account for noise and attacks, and obtain tightened safety constraints of the nominal system. This leads to a robust data-driven predictive control framework, solved in a tube-based control manner. Numerical simulations and human-in-the-loop experiments demonstrate that the RDeeP-LCC method significantly improves robustness against noise and attacks, while enhancing tracking accuracy, control efficiency, energy economy, and driving comfort.
comment: 15 pages, 7 figures
Adapting Gait Frequency for Posture-regulating Humanoid Push-recovery via Hierarchical Model Predictive Control ICRA 2025
Current humanoid push-recovery strategies often use whole-body motion, yet they tend to overlook posture regulation. For instance, in manipulation tasks, the upper body may need to stay upright and have minimal recovery displacement. This paper introduces a novel approach to enhancing humanoid push-recovery performance under unknown disturbances and regulating body posture by tailoring the recovery stepping strategy. We propose a hierarchical-MPC-based scheme that analyzes and detects instability in the prediction window and quickly recovers through adapting gait frequency. Our approach integrates a high-level nonlinear MPC, a posture-aware gait frequency adaptation planner, and a low-level convex locomotion MPC. The planners predict the center of mass (CoM) state trajectories that can be assessed for precursors of potential instability and posture deviation. In simulation, we demonstrate improved maximum recoverable impulse by 131% on average compared with baseline approaches. In hardware experiments, a 125 ms advancement in recovery stepping timing/reflex has been observed with the proposed approach. We also demonstrate improved push-recovery performance and minimized body attitude change under 0.2 rad.
comment: 7 pages, 6 figures, accepted to ICRA 2025
Gait-Net-augmented Implicit Kino-dynamic MPC for Dynamic Variable-frequency Humanoid Locomotion over Discrete Terrains
Reduced-order-model-based optimal control techniques for humanoid locomotion struggle to adapt step duration and placement simultaneously in dynamic walking gaits due to their reliance on fixed-time discretization, which limits responsiveness to various disturbances and results in suboptimal performance in challenging conditions. In this work, we propose a Gait-Net-augmented implicit kino-dynamic model-predictive control (MPC) to simultaneously optimize step location, step duration, and contact forces for natural variable-frequency locomotion. The proposed method incorporates a Gait-Net-augmented Sequential Convex MPC algorithm to solve multi-linearly constrained variables by iterative quadratic programs. At its core, a lightweight Gait-frequency Network (Gait-Net) determines the preferred step duration in terms of variable MPC sampling times, simplifying step duration optimization to the parameter level. Additionally, it enhances and updates the spatial reference trajectory within each sequential iteration by incorporating local solutions, allowing the projection of kinematic constraints to the design of reference trajectories. We validate the proposed algorithm in high-fidelity simulations and on small-size humanoid hardware, demonstrating its capability for variable-frequency and 3-D discrete terrain locomotion with only a one-step preview of terrain data.
comment: 15 pages, 13 figures, RSS 2025 accepted
Structure Identification of NDS with Descriptor Subsystems under Asynchronous, Non-Uniform, and Slow-Rate Sampling
Networked dynamic systems (NDS) exhibit collective behavior shaped by subsystem dynamics and complex interconnections, yet identifying these interconnections remains challenging due to irregularities in sampled data, including asynchronous, non-uniform, and low-rate sampling. This paper proposes a novel two-stage structure identification algorithm that leverages system zero-order moments, a concept traditionally used in model order reduction, to bridge system identification and model reduction. First, zero-order moments are estimated from steady-state time-domain outputs; second, subsystem interconnections are explicitly reconstructed from these moments. The method generalizes existing approaches by handling asynchronous, non-uniform, and slow sampling simultaneously, eliminating constraints on input signal periodicity and extending applicability to multi-input multi-output NDS with arbitrary interconnections. Unlike black-box identification techniques, our approach explicitly recovers subsystem interconnection structures. Validation on the IEEE 14-bus system demonstrates the algorithm's effectiveness in recovering subsystem interconnections from irregular sampling data.
comment: 8 pages, 3 figures, cdc2025
Criticality and Safety Margins for Reinforcement Learning
State of the art reinforcement learning methods sometimes encounter unsafe situations. Identifying when these situations occur is of interest both for post-hoc analysis and during deployment, where it might be advantageous to call out to a human overseer for help. Efforts to gauge the criticality of different points in time have been developed, but their accuracy is not well established due to a lack of ground truth, and they are not designed to be easily interpretable by end users. Therefore, we seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality. Safety margins make these interpretable, when defined as the number of random actions for which performance loss will not exceed some tolerance with high confidence. We demonstrate this approach in several environment-agent combinations; for an A3C agent in an Atari Beamrider environment, the lowest 5% of safety margins contain 47% of agent losses; i.e., supervising only 5% of decisions could potentially prevent roughly half of an agent's errors. This criticality framework measures the potential impacts of bad decisions, even before those decisions are made, allowing for more effective debugging and oversight of autonomous agents.
comment: 17 pages, 10 figures
Optimizing Parameters of the LinDistFlow Power Flow Approximation for Distribution Systems
The DistFlow model accurately represents power flows in distribution systems, but the model's nonlinearities result in computational challenges for many applications. Accordingly, a linear approximation known as \mbox{LinDistFlow} (and its three-phase extension LinDist3Flow) is commonly employed. This paper introduces a parameter optimization algorithm for enhancing the accuracy of this approximation for both balanced single-phase equivalent and unbalanced three-phase distribution network models, with the goal of aligning the outputs more closely with those from the nonlinear DistFlow model. Using sensitivity information, our algorithm optimizes the LinDistFlow approximation's coefficient and bias parameters to minimize discrepancies in predictions of voltage magnitudes relative to the nonlinear DistFlow model. The parameter optimization algorithm employs the Truncated Newton Conjugate-Gradient (TNC) method to fine-tune coefficients and bias parameters during an offline training phase to improve the LinDistFlow approximation's accuracy. % in optimization applications. Numerical results underscore the algorithm's efficacy, showcasing accuracy improvements in $L_{1}$-norm and $L_{\infty}$-norm losses of up to $92\%$ and $88\%$, respectively, relative to the traditional LinDistFlow model. We also assess how the optimized parameters perform under changes in the network topology and demonstrate the optimized LinDistFlow approximation's efficacy in a hosting capacity optimization problem.
Deficient Excitation in Parameter Learning
This paper investigates parameter learning problems under deficient excitation (DE). The DE condition is a rank-deficient, and therefore, a more general evolution of the well-known persistent excitation condition. Under the DE condition, a proposed online algorithm is able to calculate the identifiable and non-identifiable subspaces, and finally give an optimal parameter estimate in the sense of least squares. In particular, the learning error within the identifiable subspace exponentially converges to zero in the noise-free case, even without persistent excitation. The DE condition also provides a new perspective for solving distributed parameter learning problems, where the challenge is posed by local regressors that are often insufficiently excited. To improve knowledge of the unknown parameters, a cooperative learning protocol is proposed for a group of estimators that collect measured information under complementary DE conditions. This protocol allows each local estimator to operate locally in its identifiable subspace, and reach a consensus with neighbours in its non-identifiable subspace. As a result, the task of estimating unknown parameters can be achieved in a distributed way using cooperative local estimators. Application examples in system identification are given to demonstrate the effectiveness of the theoretical results developed in this paper.
comment: 16 pages,9 figures
Nonlinear Network Identifiability with Full Excitations
We derive conditions for the identifiability of nonlinear networks characterized by additive dynamics at the level of the edges when all the nodes are excited. In contrast to linear systems, we show that the measurement of all sinks is necessary and sufficient for the identifiability of directed acyclic graphs, under the assumption that dynamics are described by analytic functions without constant terms (i.e., $f(0)=0$). But if constant terms are present, then the identifiability is impossible as soon as one node has more than one in-neighbor. In the case of general digraphs that may contain cycles, we consider additively separable functions for the analysis of the identifiability, and we show that the measurement of one node of all the sinks of the condensation digraph is necessary and sufficient. Several examples are added to illustrate the results.
comment: 14 pages, 6 figures, submitted to IEEE Transactions on Automatic Control
Systems and Control (EESS)
Data-Driven Control of Continuous-Time LTI Systems via Non-Minimal Realizations
This article proposes an approach to design output-feedback controllers for unknown continuous-time linear time-invariant systems using only input-output data from a single experiment. To address the lack of state and derivative measurements, we introduce non-minimal realizations whose states can be observed by filtering the available data. We first apply this concept to the disturbance-free case, formulating linear matrix inequalities (LMIs) from batches of sampled signals to design a dynamic, filter-based stabilizing controller. The framework is then extended to the problem of asymptotic tracking and disturbance rejection - in short, output regulation - by incorporating an internal model based on prior knowledge of the disturbance/reference frequencies. Finally, we discuss tuning strategies for a class of multi-input multi-output systems and illustrate the method via numerical examples.
Learning to Pursue AC Optimal Power Flow Solutions with Feasibility Guarantees
This paper focuses on an AC optimal power flow (OPF) problem for distribution feeders equipped with controllable distributed energy resources (DERs). We consider a solution method that is based on a continuous approximation of the projected gradient flow - referred to as the safe gradient flow - that incorporates voltage and current information obtained either through real-time measurements or power flow computations. These two setups enable both online and offline implementations. The safe gradient flow involves the solution of convex quadratic programs (QPs). To enhance computational efficiency, we propose a novel framework that employs a neural network approximation of the optimal solution map of the QP. The resulting method has two key features: (a) it ensures that the DERs' setpoints are practically feasible, even for an online implementation or when an offline algorithm has an early termination; (b) it ensures convergence to a neighborhood of a strict local optimizer of the AC OPF. The proposed method is tested on a 93-node distribution system with realistic loads and renewable generation. The test shows that our method successfully regulates voltages within limits during periods with high renewable generation.
Current trends and future directions in event-based control
The defining characteristic of event-based control is that feedback loops are only closed when indicated by a triggering condition that takes recent information about the system into account. This stands in contrast to periodic control where the feedback loop is closed periodically. Benefits of event-based control arise when sampling comes at a cost, which occurs, e.g., for Networked Control Systems or in other setups with resource constraints. A rapidly growing number of publications deals with event-based control. Nevertheless, some fundamental questions about event-based control are still unsolved. In this article, we provide an overview of current research trends in event-based control. We focus on results that aim for a better understanding of effects that occur in feedback loops with event-based control. Based on this summary, we identify important open directions for future research.
comment: Preprint submitted to the European Journal of Control
Operator-Splitting Methods for Neuromorphic Circuit Simulation
A novel splitting algorithm is proposed for the numerical simulation of neuromorphic circuits. The algorithm is grounded in the operator-theoretic concept of monotonicity, which bears both physical and algorithmic significance. The splitting exploits this correspondence to translate the circuit architecture into the algorithmic architecture. The paper illustrates the many advantages of the proposed operator-theoretic framework over conventional numerical integration for the simulation of multiscale hierarchical events that characterize neuromorphic behaviors.
State and Input Constrained Adaptive Tracking Control of Uncertain Euler-Lagrange Systems with Robustness and Feasibility Analysis
This paper proposes an adaptive tracking controller for uncertain Euler-Lagrange (E-L) systems with user-defined state and input constraints in presence of bounded external disturbances. A barrier Lyapunov function (BLF) is employed for state constraint satisfaction, integrated with a saturated controller that ensures the control input remains within pre-specified bounds. To the best of the authors' knowledge, this is the first result on tracking control of state and input-constrained uncertain E-L systems that provides verifiable conditions for the existence of a feasible control policy. The efficacy of the proposed controller in terms of constraint satisfaction and tracking performance is demonstrated through simulation on a robotic manipulator system.
comment: arXiv admin note: text overlap with arXiv:2212.08495
State Constrained Model Reference Adaptive Control with Input Amplitude and Rate Limits
This paper proposes a robust model reference adaptive controller (MRAC) for uncertain multi-input multi-output (MIMO) linear time-invariant (LTI) plants with user-defined constraints on the plant states, input amplitude, and input rate. The proposed two-layer barrier Lyapunov function (BLF)-based control design considers the input and the input rate as states that are constrained using two BLFs in the first layer, while another BLF in the second layer constrains the plant states. The adaptive control law ensures that the plant states, input amplitude, and input rate remain within the user-defined safe sets despite unmatched bounded disturbances. Sufficient conditions for the existence of a feasible control policy are also provided. To the best of the authors' knowledge, this is the first optimization-free method that imposes user-defined constraints on the state, input, and input rate and also provides verifiable feasibility conditions in the presence of parametric uncertainties and disturbances. Simulation results demonstrate the effectiveness of the proposed algorithm.
A Multi-output Gaussian Process Regression with Negative Transfer Mitigation for Generating Boundary Test Scenarios of Multi-UAV Systems
Adaptive sampling based on Gaussian process regression (GPR) has already been applied with considerable success to generate boundary test scenarios for multi-UAV systems (MUS). One of the key techniques in such researches is leveraging the accurate prediction of the MUS performance through GPR in different test scenarios. Due to the potential correlations among the multiple MUS performance metrics, current researches commonly utilize a multi-output GPR (MOGPR) to model the multiple performance metrics simultaneously. This approach can achieve a more accurate prediction, rather than modeling each metric individually. However, MOGPR still suffers from negative transfer. When the feature of one output variable is incorrectly learned by another, the models training process will be negatively affected, leading to a decline in prediction performance. To solve this problem, this paper proposes a novel adaptive regularization approach into the conventional MOGPR training process. Unlike existing regularization approaches for mitigating negative transfer in MOGPR, our method penalizes the inconsistencies among output-specific characteristic parameters using adaptively adjustable regularization weights. This mechanism helps each set of output parameters avoid local optima. Consequently, it yields simultaneous improvements in predictive accuracy across all outputs. Finally, we validate our approach on a numerical case and on a boundary test scenario generation case for a MUS multi-objectives search task.
comment: Article,29 pages,18 figures
On data usage and predictive behavior of data-driven predictive control with 1-norm regularization
We investigate the data usage and predictive behavior of data-driven predictive control (DPC) with 1-norm regularization. Our analysis enables the offline removal of unused data and facilitates a comparison between the identified symmetric structure and data usage against prior knowledge of the true system. This comparison helps assess the suitability of the DPC scheme for effective control.
comment: This paper is a preprint of a contribution to the IEEE Control Systems Letters. 6 pages, 3 figures
A memristive model of spatio-temporal excitability
This paper introduces a model of excitability that unifies the mechanism of an important neuronal property both in time and in space. As a starting point, we revisit both a key model of temporal excitability, proposed by Hodgkin and Huxley, and a key model of spatial excitability, proposed by Amari. We then propose a novel model that captures the temporal and spatial properties of both models. Our aim is to regard neuronal excitability as a property across scales, and to explore the benefits of modeling excitability with one and the same mechanism, whether at the cellular or the population level.
comment: 7 pages, 9 figures, submitted for CDC 2025
Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance
In this paper, we propose a novel dynamic state-feedback controller for polytopic linear parameter-varying (LPV) systems with constant input matrix. The controller employs a projected gradient flow method to continuously improve its control law and, under established conditions, converges to the optimal feedback gain of the corresponding linear quadratic regulator (LQR) problem associated with constant parameter trajectories. We derive conditions for quadratic stability, which can be verified via convex optimization, to ensure exponential stability of the LPV system even under arbitrarily fast parameter variations. Additionally, we provide sufficient conditions to guarantee the boundedness of the trajectories of the dynamic controller for any parameter trajectory and the convergence of its feedback gains to the optimal LQR gains for constant parameter trajectories. Furthermore, we show that the closed-loop system is asymptotically stable for constant parameter trajectories under these conditions. Simulation results demonstrate that the controller maintains stability and improves transient performance.
comment: submitted to Transactions on Automatic Control
Efficient Dynamic Shielding for Parametric Safety Specifications
Shielding has emerged as a promising approach for ensuring safety of AI-controlled autonomous systems. The algorithmic goal is to compute a shield, which is a runtime safety enforcement tool that needs to monitor and intervene the AI controller's actions if safety could be compromised otherwise. Traditional shields are designed statically for a specific safety requirement. Therefore, if the safety requirement changes at runtime due to changing operating conditions, the shield needs to be recomputed from scratch, causing delays that could be fatal. We introduce dynamic shields for parametric safety specifications, which are succinctly represented sets of all possible safety specifications that may be encountered at runtime. Our dynamic shields are statically designed for a given safety parameter set, and are able to dynamically adapt as the true safety specification (permissible by the parameters) is revealed at runtime. The main algorithmic novelty lies in the dynamic adaptation procedure, which is a simple and fast algorithm that utilizes known features of standard safety shields, like maximal permissiveness. We report experimental results for a robot navigation problem in unknown territories, where the safety specification evolves as new obstacles are discovered at runtime. In our experiments, the dynamic shields took a few minutes for their offline design, and took between a fraction of a second and a few seconds for online adaptation at each step, whereas the brute-force online recomputation approach was up to 5 times slower.
Physical Reduced Stochastic Equations for Continuously Monitored Non-Markovian Quantum Systems with a Markovian Embedding
An effective approach to modeling non-Markovian quantum systems is to embed a principal (quantum) system of interest into a larger quantum system. A widely employed embedding is one that uses another quantum system, referred to as the auxiliary system, which is coupled to the principal system, and both the principal and auxiliary can be coupled to quantum white noise processes. The principal and auxiliary together form a quantum Markov system and the quantum white noises act as a bath (environment) for this system. Recently it was shown that the conditional evolution of the principal system in this embedding under continuous monitoring by a travelling quantum probe can be expressed as a system of coupled stochastic differential equations (SDEs) that involve only operators of the principal system. The reduced conditional state of the principal only (conditioned on the measurement outcomes) are determined by the "diagonal" blocks of this coupled systems of SDEs. It is shown here that the "off-diagonal" blocks can be exactly eliminated up to their initial conditions, leaving a reduced closed system of SDEs for the diagonal blocks only. Under additional conditions the off-diagonal initial conditions can be made to vanish. This new closed system of equations, which includes an integration term involving a two-time stochastic kernel, represents the non-Markovian stochastic dynamics of the principal system under continuous-measurement. The system of equations determine the reduced conditional state of the principal only and may be viewed as a stochastic Nakajima-Zwanzig type of equation for continuously monitored non-Markovian quantum systems.
comment: 7 pages, no figures. Accepted for publication in IEEE Control Systems Letters (http://ieee-cssletters.dei.unipd.it/)
Properties of zero-determinant strategies in multichannel games
Controlling payoffs in repeated games is one of the important topics in control theory of multi-agent systems. Recently proposed zero-determinant strategies enable players to unilaterally enforce linear relations between payoffs. Furthermore, based on the mathematics of zero-determinant strategies, regional payoff control, in which payoffs are enforced into some feasible regions, has been discovered in social dilemma situations. More recently, theory of payoff control was extended to multichannel games, where players parallelly interact with each other in multiple channels. However, properties of zero-determinant strategies specific to multichannel games are still not clear. In this paper, we elucidate properties of zero-determinant strategies in multichannel games. First, we relate the existence condition of zero-determinant strategies in multichannel games to that of zero-determinant strategies in each channel. We then show that the existence of zero-determinant strategies in multichannel games requires the existence of zero-determinant strategies in some channels. This result implies that the existence of zero-determinant strategies in multichannel games is tightly restricted by structure of games played in each channel.
comment: 12 pages
Large Language Models for Solving Economic Dispatch Problem
This paper investigates the capability of off-the-shelf large language models (LLMs) to solve the economic dispatch (ED) problem. ED is a hard-constrained optimization problem solved on a day-ahead timescale by grid operators to minimize electricity generation costs while accounting for physical and engineering constraints. Numerous approaches have been proposed, but these typically require either mathematical formulations, face convergence issues, or depend on extensive labeled data and training time. This work implements LLMs enhanced with reasoning capabilities to address the classic lossless ED problem. The proposed approach avoids the need for explicit mathematical formulations, does not suffer from convergence challenges, and requires neither labeled data nor extensive training. A few-shot learning technique is utilized in two different prompting contexts. The IEEE 118-bus system with 19 generation units serves as the evaluation benchmark. Results demonstrate that various prompting strategies enable LLMs to effectively solve the ED problem, offering a convenient and efficient alternative. Consequently, this approach presents a promising future solution for ED tasks, particularly when foundational power system models are available.
comment: 5 pages, 3 figures, Accepted, 2025 IEEE Energy Conversion Conference and Expo (ECCE 2025), Philadelphia, PA
Online distributed optimization for spatio-temporally constrained real-time peer-to-peer energy trading
The proliferation of distributed renewable energy triggers the peer-to-peer (P2P) energy market formations. To make profits, prosumers equipped with photovoltaic (PV) panels and even the energy storage system (ESS) can actively participate in the real-time P2P energy market and trade energy. However, in real situations, system states such as energy demands and renewable energy power generation are highly uncertain, making it difficult for prosumers to make optimal real-time decisions. Moreover, severe problems with the physical network can arise from the real-time P2P energy trading, such as bus voltage violations and line overload. To handle these problems, this work first formulates the real-time P2P energy trading problem as a spatio-temporally constrained stochastic optimization problem by considering ESS and the spatial physical network constraints. To deal with the uncertainties online, a modified Lyapunov optimization method is innovatively proposed to approximately reformulate the stochastic optimization problem into an online one by relaxing the time-coupling constraints. Compared with the state-of-the-art online methods, the proposed one renders more flexibility and better performance for the real-time P2P energy market operation. Additionally, to protect the prosumers' privacy, an online distributed algorithm based on the consensus alternating direction method of multipliers (ADMM) is developed to solve the reformulated online problem by decoupling the spatial constraints. The theoretical near-optimal performance guarantee of the proposed online distributed algorithm is derived, and its performance can be further improved by minimizing the performance gap. Simulation results demonstrate that the proposed online distributed algorithm can guarantee the fast, stable, and safe long-term operation of the real-time P2P energy market.
A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem
We propose a physics-informed neural networks (PINNs) framework to solve the infinite-horizon optimal control problem of nonlinear systems. In particular, since PINNs are generally able to solve a class of partial differential equations (PDEs), they can be employed to learn the value function of the infinite-horizon optimal control problem via solving the associated steady-state Hamilton-Jacobi-Bellman (HJB) equation. However, an issue here is that the steady-state HJB equation generally yields multiple solutions; hence if PINNs are directly employed to it, they may end up approximating a solution that is different from the optimal value function of the problem. We tackle this by instead applying PINNs to a finite-horizon variant of the steady-state HJB that has a unique solution, and which uniformly approximates the optimal value function as the horizon increases. An algorithm to verify if the chosen horizon is large enough is also given, as well as a method to extend it -- with reduced computations and robustness to approximation errors -- in case it is not. Unlike many existing methods, the proposed technique works well with non-polynomial basis functions, does not require prior knowledge of a stabilizing controller, and does not perform iterative policy evaluations. Simulations are performed, which verify and clarify theoretical findings.
comment: Accepted with minor revisions at International Journal of Robust and Nonlinear Control
Nonadaptive Output Regulation of Second-Order Nonlinear Uncertain Systems
This paper investigates the robust output regulation problem of second-order nonlinear uncertain systems with an unknown exosystem. Instead of the adaptive control approach, this paper resorts to a robust control methodology to solve the problem and thus avoid the bursting phenomenon. In particular, this paper constructs generic internal models for the steady-state state and input variables of the system. By introducing a coordinate transformation, this paper converts the robust output regulation problem into a nonadaptive stabilization problem of an augmented system composed of the second-order nonlinear uncertain system and the generic internal models. Then, we design the stabilization control law and construct a strict Lyapunov function that guarantees the robustness with respect to unmodeled disturbances. The analysis shows that the output zeroing manifold of the augmented system can be made attractive by the proposed nonadaptive control law, which solves the robust output regulation problem. Finally, we demonstrate the effectiveness of the proposed nonadaptive internal model approach by its application to the control of the Duffing system.
comment: 8 pages, 3 figures
Spring-Brake! Handed Shearing Auxetics Improve Efficiency of Hopping and Standing
Energy efficiency is critical to the success of legged robotics. Efficiency is lost through wasted energy during locomotion and standing. Including elastic elements has been shown to reduce movement costs, while including breaks can reduce standing costs. However, adding separate elements for each increases the mass and complexity of a leg, reducing overall system performance. Here we present a novel compliant mechanism using a Handed Shearing Auxetic (HSA) that acts as a spring and break in a monopod hopping robot. The HSA acts as a parallel elastic actuator, reducing electrical power for dynamic hopping and matching the efficiency of state-of-the-art compliant hoppers. The HSA\u2019s auxetic behavior enables dual functionality. During static tasks, it locks under large forces with minimal input power by blocking deformation, creating high friction similar to a capstan mechanism. This allows the leg to support heavy loads without motor torque, addressing thermal inefficiency. The multi-functional design enhances both dynamic and static performance, offering a versatile solution for robotic applications.
comment: This work has been submitted to the IEEE for possible publication
Local Stability and Region of Attraction Analysis for Neural Network Feedback Systems under Positivity Constraints
We study the local stability of nonlinear systems in the Lur'e form with static nonlinear feedback realized by feedforward neural networks (FFNNs). By leveraging positivity system constraints, we employ a localized variant of the Aizerman conjecture, which provides sufficient conditions for exponential stability of trajectories confined to a compact set. Using this foundation, we develop two distinct methods for estimating the Region of Attraction (ROA): (i) a less conservative Lyapunov-based approach that constructs invariant sublevel sets of a quadratic function satisfying a linear matrix inequality (LMI), and (ii) a novel technique for computing tight local sector bounds for FFNNs via layer-wise propagation of linear relaxations. These bounds are integrated into the localized Aizerman framework to certify local exponential stability. Numerical results demonstrate substantial improvements over existing integral quadratic constraint-based approaches in both ROA size and scalability.
comment: Submitted to 64th IEEE Conference on Decision and Control 2025 - Rio de Janeiro, Brazil
Flexure-FET-Based Receiver with Competitive Binding for Interference Mitigation in Molecular Communication
Molecular communication (MC), a biologically inspired technology, enables applications in nanonetworks and the Internet of Everything (IoE), with great potential for intra-body systems such as drug delivery, health monitoring, and disease detection. This paper extends our prior work on the Flexure-FET MC receiver by integrating a competitive binding model to enhance performance in high-interference environments, where multiple molecular species coexist in the reception space. Previous studies have largely focused on ligand concentration estimation and detection, without fully addressing the effects of inter-species competition for receptor binding. Our proposed framework captures this competition, offering a more biologically accurate model for multitarget environments. By incorporating competition dynamics, the model improves understanding of MC behavior under interference. This approach enables fine-tuning of receptor responses by adjusting ligand concentrations and receptor affinities, thereby optimizing the performance of the Flexure-FET MC receiver. Comprehensive analysis shows that accounting for competitive binding is crucial for improving reliability and accuracy in complex MC systems. Factors such as signal-to-noise ratio (SNR), symbol error probability (SEP), interferer concentration, and receptor dynamics are shown to significantly affect performance. The proposed framework highlights the need to manage these factors effectively. Results demonstrate that modeling interference through competitive binding offers a realistic system perspective and allows tuning of receiver response, enabling robust detection in environments with multiple coexisting species.
Learning-Based Robust Fixed-Time Terminal Sliding Mode Control
In this paper, we develop and analyze an integral fixed-time sliding mode control method for a scenario in which the system model is only partially known, utilizing Gaussian processes. We present two theorems on fixed-time convergence. The first theorem addresses the fully known system model, while the second considers situations where the system's drift is approximated utilizing Gaussian processes (GP) for approximating unknown dynamics. Both theorems establish the global fixed-time stability of the closed-loop system. The stability analysis is based on a straightforward quadratic Lyapunov function. Our proposed method outperforms an established adaptive fixed-time sliding mode control approach, especially when ample training data is available.
PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow
Machine Learning (ML) techniques for Optimal Power Flow (OPF) problems have recently garnered significant attention, reflecting a broader trend of leveraging ML to approximate and/or accelerate the resolution of complex optimization problems. These developments are necessitated by the increased volatility and scale in energy production for modern and future grids. However, progress in ML for OPF is hindered by the lack of standardized datasets and evaluation metrics, from generating and solving OPF instances, to training and benchmarking machine learning models. To address this challenge, this paper introduces PGLearn, a comprehensive suite of standardized datasets and evaluation tools for ML and OPF. PGLearn provides datasets that are representative of real-life operating conditions, by explicitly capturing both global and local variability in the data generation, and by, for the first time, including time series data for several large-scale systems. In addition, it supports multiple OPF formulations, including AC, DC, and second-order cone formulations. Standardized datasets are made publicly available to democratize access to this field, reduce the burden of data generation, and enable the fair comparison of various methodologies. PGLearn also includes a robust toolkit for training, evaluating, and benchmarking machine learning models for OPF, with the goal of standardizing performance evaluation across the field. By promoting open, standardized datasets and evaluation metrics, PGLearn aims at democratizing and accelerating research and innovation in machine learning applications for optimal power flow problems. Datasets are available for download at https://www.huggingface.co/PGLearn.
Split the Yield, Share the Risk: Pricing, Hedging and Fixed rates in DeFi
We present the first formal treatment of \emph{yield tokenization}, a mechanism that decomposes yield-bearing assets into principal and yield components to facilitate risk transfer and price discovery in decentralized finance (DeFi). We propose a model that characterizes yield token dynamics using stochastic differential equations. We derive a no-arbitrage pricing framework for yield tokens, enabling their use in hedging future yield volatility and managing interest rate risk in decentralized lending pools. Taking DeFi lending as our focus, we show how both borrowers and lenders can use yield tokens to achieve optimal hedging outcomes and mitigate exposure to adversarial interest rate manipulation. Furthermore, we design automated market makers (AMMs) that incorporate a menu of bonding curves to aggregate liquidity from participants with heterogeneous risk preferences. This leads to an efficient and incentive-compatible mechanism for trading yield tokens and yield futures. Building on these foundations, we propose a modular \textit{fixed-rate} lending protocol that synthesizes on-chain yield token markets and lending pools, enabling robust interest rate discovery and enhancing capital efficiency. Our work provides the theoretical underpinnings for risk management and fixed-income infrastructure in DeFi, offering practical mechanisms for stable and sustainable yield markets.
Preference Adaptive and Sequential Text-to-Image Generation ICML 2025
We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.
comment: Accepted to ICML 2025 Link to PASTA dataset: https://www.kaggle.com/datasets/googleai/pasta-data
Novelty Detection in Reinforcement Learning with World Models ICML
Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.
comment: ICML Spotlight 2025
Embedding Safety into RL: A New Take on Trust Region Methods ICML 2025
Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
comment: Accepted at ICML 2025
Robust Output Tracking for an Uncertain and Nonlinear 3D PDE-ODE System: Preventing Induced Seismicity in Underground Reservoirs
This paper presents a robust control strategy for output tracking of a nonlinear 3D PDE-ODE system, where the ODE has logistic-like dynamics. The output feedback control was developed by bounding the solution and its time derivative for both the infinite-dimensional system and the nonlinear ODE. These bounds were then leveraged to ensure the boundedness of the control coefficient and the perturbations in the error dynamics. The mathematical framework proves the controller's ability to manage two output types within the system, overcoming model uncertainties and heterogeneities, using minimal system information, and a continuous control signal. A case study addressing induced seismicity mitigation while ensuring energy production in the Groningen gas reservoir highlights the control's effectiveness. The strategy guarantees precise tracking of target seismicity rates and pressures across reservoir regions, even under parameter uncertainties. Numerical simulations validate the approach in two scenarios: gas extraction while not exceeding the intrinsic seismicity of the region and the addition of CO2 injections, achieving net-zero environmental impact.
Learning-Based MPC for Fuel Efficient Control of Autonomous Vehicles with Discrete Gear Selection
Co-optimization of both vehicle speed and gear position via model predictive control (MPC) has been shown to offer benefits for fuel-efficient autonomous driving. However, optimizing both the vehicle's continuous dynamics and discrete gear positions may be too computationally intensive for a real-time implementation. This work proposes a learning-based MPC scheme to address this issue. A policy is trained to select and fix the gear positions across the prediction horizon of the MPC controller, leaving a significantly simpler continuous optimization problem to be solved online. In simulation, the proposed approach is shown to have a significantly lower computation burden and a comparable performance, with respect to pure MPC-based co-optimization.
comment: 7 pages, 3 figures, accepted for publication in L-CSS. Code available at https://github.com/SamuelMallick/mpcrl-vehicle-gears
Extending direct data-driven predictive control towards systems with finite control sets
Although classical model predictive control with finite control sets (FCS-MPC) is quite a popular control method, particularly in the realm of power electronics systems, its direct data-driven predictive control (FCS-DPC) counterpart has received relatively limited attention. In this paper, we introduce a novel reformulation of a commonly used DPC scheme that allows for the application of a modified sphere decoding algorithm, known for its efficiency and prominence in FCS-MPC applications. We test the reformulation on a popular electrical drive example and compare the computation times of sphere decoding FCS-DPC with an enumeration-based and a MIQP method.
comment: This paper is a preprint of a contribution to the 22nd European Control Conference 2024. 7 pages, 1 figure
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and researchers often lack access to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving towards semi-fullscale validation with R/V milliAmpere1 and full-scale validation with R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC), including autonomy.
A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems ICML 2025
As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems.
comment: 22 Pages, 12 Figures. First two authors have contributed equally. Accepted at ICML 2025
UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control
Recent advances in diffusion bridge models leverage Doob's $h$-transform to establish fixed endpoints between distributions, demonstrating promising results in image translation and restoration tasks. However, these approaches frequently produce blurred or excessively smoothed image details and lack a comprehensive theoretical foundation to explain these shortcomings. To address these limitations, we propose UniDB, a unified framework for diffusion bridges based on Stochastic Optimal Control (SOC). UniDB formulates the problem through an SOC-based optimization and derives a closed-form solution for the optimal controller, thereby unifying and generalizing existing diffusion bridge models. We demonstrate that existing diffusion bridges employing Doob's $h$-transform constitute a special case of our framework, emerging when the terminal penalty coefficient in the SOC cost function tends to infinity. By incorporating a tunable terminal penalty coefficient, UniDB achieves an optimal balance between control costs and terminal penalties, substantially improving detail preservation and output quality. Notably, UniDB seamlessly integrates with existing diffusion bridge models, requiring only minimal code modifications. Extensive experiments across diverse image restoration tasks validate the superiority and adaptability of the proposed framework. Our code is available at https://github.com/UniDB-SOC/UniDB/.
Robust Data-Driven Predictive Control for Mixed Platoons under Noise and Attacks
Controlling mixed platoons, which consist of both connected and automated vehicles (CAVs) and human-driven vehicles (HDVs), poses significant challenges due to the uncertain and unknown human driving behaviors. Data-driven control methods offer promising solutions by leveraging available trajectory data, but their performance can be compromised by noise and attacks. To address this issue, this paper proposes a Robust Data-EnablEd Predictive Leading Cruise Control (RDeeP-LCC) framework based on data-driven reachability analysis. The framework over-approximates system dynamics under noise and attack using a matrix zonotope set derived from data, and develops a stabilizing feedback control law. By decoupling the mixed platoon system into nominal and error components, we employ data-driven reachability sets to recursively compute error reachable sets that account for noise and attacks, and obtain tightened safety constraints of the nominal system. This leads to a robust data-driven predictive control framework, solved in a tube-based control manner. Numerical simulations and human-in-the-loop experiments demonstrate that the RDeeP-LCC method significantly improves robustness against noise and attacks, while enhancing tracking accuracy, control efficiency, energy economy, and driving comfort.
comment: 15 pages, 7 figures
Adapting Gait Frequency for Posture-regulating Humanoid Push-recovery via Hierarchical Model Predictive Control ICRA 2025
Current humanoid push-recovery strategies often use whole-body motion, yet they tend to overlook posture regulation. For instance, in manipulation tasks, the upper body may need to stay upright and have minimal recovery displacement. This paper introduces a novel approach to enhancing humanoid push-recovery performance under unknown disturbances and regulating body posture by tailoring the recovery stepping strategy. We propose a hierarchical-MPC-based scheme that analyzes and detects instability in the prediction window and quickly recovers through adapting gait frequency. Our approach integrates a high-level nonlinear MPC, a posture-aware gait frequency adaptation planner, and a low-level convex locomotion MPC. The planners predict the center of mass (CoM) state trajectories that can be assessed for precursors of potential instability and posture deviation. In simulation, we demonstrate improved maximum recoverable impulse by 131% on average compared with baseline approaches. In hardware experiments, a 125 ms advancement in recovery stepping timing/reflex has been observed with the proposed approach. We also demonstrate improved push-recovery performance and minimized body attitude change under 0.2 rad.
comment: 7 pages, 6 figures, accepted to ICRA 2025
Gait-Net-augmented Implicit Kino-dynamic MPC for Dynamic Variable-frequency Humanoid Locomotion over Discrete Terrains
Reduced-order-model-based optimal control techniques for humanoid locomotion struggle to adapt step duration and placement simultaneously in dynamic walking gaits due to their reliance on fixed-time discretization, which limits responsiveness to various disturbances and results in suboptimal performance in challenging conditions. In this work, we propose a Gait-Net-augmented implicit kino-dynamic model-predictive control (MPC) to simultaneously optimize step location, step duration, and contact forces for natural variable-frequency locomotion. The proposed method incorporates a Gait-Net-augmented Sequential Convex MPC algorithm to solve multi-linearly constrained variables by iterative quadratic programs. At its core, a lightweight Gait-frequency Network (Gait-Net) determines the preferred step duration in terms of variable MPC sampling times, simplifying step duration optimization to the parameter level. Additionally, it enhances and updates the spatial reference trajectory within each sequential iteration by incorporating local solutions, allowing the projection of kinematic constraints to the design of reference trajectories. We validate the proposed algorithm in high-fidelity simulations and on small-size humanoid hardware, demonstrating its capability for variable-frequency and 3-D discrete terrain locomotion with only a one-step preview of terrain data.
comment: 15 pages, 13 figures, RSS 2025 accepted
Structure Identification of NDS with Descriptor Subsystems under Asynchronous, Non-Uniform, and Slow-Rate Sampling
Networked dynamic systems (NDS) exhibit collective behavior shaped by subsystem dynamics and complex interconnections, yet identifying these interconnections remains challenging due to irregularities in sampled data, including asynchronous, non-uniform, and low-rate sampling. This paper proposes a novel two-stage structure identification algorithm that leverages system zero-order moments, a concept traditionally used in model order reduction, to bridge system identification and model reduction. First, zero-order moments are estimated from steady-state time-domain outputs; second, subsystem interconnections are explicitly reconstructed from these moments. The method generalizes existing approaches by handling asynchronous, non-uniform, and slow sampling simultaneously, eliminating constraints on input signal periodicity and extending applicability to multi-input multi-output NDS with arbitrary interconnections. Unlike black-box identification techniques, our approach explicitly recovers subsystem interconnection structures. Validation on the IEEE 14-bus system demonstrates the algorithm's effectiveness in recovering subsystem interconnections from irregular sampling data.
comment: 8 pages, 3 figures, cdc2025
Criticality and Safety Margins for Reinforcement Learning
State of the art reinforcement learning methods sometimes encounter unsafe situations. Identifying when these situations occur is of interest both for post-hoc analysis and during deployment, where it might be advantageous to call out to a human overseer for help. Efforts to gauge the criticality of different points in time have been developed, but their accuracy is not well established due to a lack of ground truth, and they are not designed to be easily interpretable by end users. Therefore, we seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality. Safety margins make these interpretable, when defined as the number of random actions for which performance loss will not exceed some tolerance with high confidence. We demonstrate this approach in several environment-agent combinations; for an A3C agent in an Atari Beamrider environment, the lowest 5% of safety margins contain 47% of agent losses; i.e., supervising only 5% of decisions could potentially prevent roughly half of an agent's errors. This criticality framework measures the potential impacts of bad decisions, even before those decisions are made, allowing for more effective debugging and oversight of autonomous agents.
comment: 17 pages, 10 figures
Optimizing Parameters of the LinDistFlow Power Flow Approximation for Distribution Systems
The DistFlow model accurately represents power flows in distribution systems, but the model's nonlinearities result in computational challenges for many applications. Accordingly, a linear approximation known as \mbox{LinDistFlow} (and its three-phase extension LinDist3Flow) is commonly employed. This paper introduces a parameter optimization algorithm for enhancing the accuracy of this approximation for both balanced single-phase equivalent and unbalanced three-phase distribution network models, with the goal of aligning the outputs more closely with those from the nonlinear DistFlow model. Using sensitivity information, our algorithm optimizes the LinDistFlow approximation's coefficient and bias parameters to minimize discrepancies in predictions of voltage magnitudes relative to the nonlinear DistFlow model. The parameter optimization algorithm employs the Truncated Newton Conjugate-Gradient (TNC) method to fine-tune coefficients and bias parameters during an offline training phase to improve the LinDistFlow approximation's accuracy. % in optimization applications. Numerical results underscore the algorithm's efficacy, showcasing accuracy improvements in $L_{1}$-norm and $L_{\infty}$-norm losses of up to $92\%$ and $88\%$, respectively, relative to the traditional LinDistFlow model. We also assess how the optimized parameters perform under changes in the network topology and demonstrate the optimized LinDistFlow approximation's efficacy in a hosting capacity optimization problem.
Deficient Excitation in Parameter Learning
This paper investigates parameter learning problems under deficient excitation (DE). The DE condition is a rank-deficient, and therefore, a more general evolution of the well-known persistent excitation condition. Under the DE condition, a proposed online algorithm is able to calculate the identifiable and non-identifiable subspaces, and finally give an optimal parameter estimate in the sense of least squares. In particular, the learning error within the identifiable subspace exponentially converges to zero in the noise-free case, even without persistent excitation. The DE condition also provides a new perspective for solving distributed parameter learning problems, where the challenge is posed by local regressors that are often insufficiently excited. To improve knowledge of the unknown parameters, a cooperative learning protocol is proposed for a group of estimators that collect measured information under complementary DE conditions. This protocol allows each local estimator to operate locally in its identifiable subspace, and reach a consensus with neighbours in its non-identifiable subspace. As a result, the task of estimating unknown parameters can be achieved in a distributed way using cooperative local estimators. Application examples in system identification are given to demonstrate the effectiveness of the theoretical results developed in this paper.
comment: 16 pages,9 figures
Nonlinear Network Identifiability with Full Excitations
We derive conditions for the identifiability of nonlinear networks characterized by additive dynamics at the level of the edges when all the nodes are excited. In contrast to linear systems, we show that the measurement of all sinks is necessary and sufficient for the identifiability of directed acyclic graphs, under the assumption that dynamics are described by analytic functions without constant terms (i.e., $f(0)=0$). But if constant terms are present, then the identifiability is impossible as soon as one node has more than one in-neighbor. In the case of general digraphs that may contain cycles, we consider additively separable functions for the analysis of the identifiability, and we show that the measurement of one node of all the sinks of the condensation digraph is necessary and sufficient. Several examples are added to illustrate the results.
comment: 14 pages, 6 figures, submitted to IEEE Transactions on Automatic Control
Robotics
CLAMP: Crowdsourcing a LArge-scale in-the-wild haptic dataset with an open-source device for Multimodal robot Perception
Robust robot manipulation in unstructured environments often requires understanding object properties that extend beyond geometry, such as material or compliance-properties that can be challenging to infer using vision alone. Multimodal haptic sensing provides a promising avenue for inferring such properties, yet progress has been constrained by the lack of large, diverse, and realistic haptic datasets. In this work, we introduce the CLAMP device, a low-cost (<\$200) sensorized reacher-grabber designed to collect large-scale, in-the-wild multimodal haptic data from non-expert users in everyday settings. We deployed 16 CLAMP devices to 41 participants, resulting in the CLAMP dataset, the largest open-source multimodal haptic dataset to date, comprising 12.3 million datapoints across 5357 household objects. Using this dataset, we train a haptic encoder that can infer material and compliance object properties from multimodal haptic data. We leverage this encoder to create the CLAMP model, a visuo-haptic perception model for material recognition that generalizes to novel objects and three robot embodiments with minimal finetuning. We also demonstrate the effectiveness of our model in three real-world robot manipulation tasks: sorting recyclable and non-recyclable waste, retrieving objects from a cluttered bag, and distinguishing overripe from ripe bananas. Our results show that large-scale, in-the-wild haptic data collection can unlock new capabilities for generalizable robot manipulation. Website: https://emprise.cs.cornell.edu/clamp/
CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects
Synthesizing whole-body manipulation of articulated objects, including body motion, hand motion, and object motion, is a critical yet challenging task with broad applications in virtual humans and robotics. The core challenges are twofold. First, achieving realistic whole-body motion requires tight coordination between the hands and the rest of the body, as their movements are interdependent during manipulation. Second, articulated object manipulation typically involves high degrees of freedom and demands higher precision, often requiring the fingers to be placed at specific regions to actuate movable parts. To address these challenges, we propose a novel coordinated diffusion noise optimization framework. Specifically, we perform noise-space optimization over three specialized diffusion models for the body, left hand, and right hand, each trained on its own motion dataset to improve generalization. Coordination naturally emerges through gradient flow along the human kinematic chain, allowing the global body posture to adapt in response to hand motion objectives with high fidelity. To further enhance precision in hand-object interaction, we adopt a unified representation based on basis point sets (BPS), where end-effector positions are encoded as distances to the same BPS used for object geometry. This unified representation captures fine-grained spatial relationships between the hand and articulated object parts, and the resulting trajectories serve as targets to guide the optimization of diffusion noise, producing highly accurate interaction motion. We conduct extensive experiments demonstrating that our method outperforms existing approaches in motion quality and physical plausibility, and enables various capabilities such as object pose control, simultaneous walking and manipulation, and whole-body generation from hand-only data.
comment: Project page: https://phj128.github.io/page/CoDA/index.html
Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Humans practice slow thinking before performing actual actions when handling complex tasks in the physical world. This thinking paradigm, recently, has achieved remarkable advancement in boosting Large Language Models (LLMs) to solve complex tasks in digital domains. However, the potential of slow thinking remains largely unexplored for robotic foundation models interacting with the physical world. In this work, we propose Hume: a dual-system Vision-Language-Action (VLA) model with value-guided System-2 thinking and cascaded action denoising, exploring human-like thinking capabilities of Vision-Language-Action models for dexterous robot control. System 2 of Hume implements value-Guided thinking by extending a Vision-Language-Action Model backbone with a novel value-query head to estimate the state-action value of predicted actions. The value-guided thinking is conducted by repeat sampling multiple action candidates and selecting one according to state-action value. System 1 of Hume is a lightweight reactive visuomotor policy that takes System 2 selected action and performs cascaded action denoising for dexterous robot control. At deployment time, System 2 performs value-guided thinking at a low frequency while System 1 asynchronously receives the System 2 selected action candidate and predicts fluid actions in real time. We show that Hume outperforms the existing state-of-the-art Vision-Language-Action models across multiple simulation benchmark and real-robot deployments.
MRSD: Multi-Resolution Skill Discovery for HRL Agents
Hierarchical reinforcement learning (HRL) relies on abstract skills to solve long-horizon tasks efficiently. While existing skill discovery methods learns these skills automatically, they are limited to a single skill per task. In contrast, humans learn and use both fine-grained and coarse motor skills simultaneously. Inspired by human motor control, we propose Multi-Resolution Skill Discovery (MRSD), an HRL framework that learns multiple skill encoders at different temporal resolutions in parallel. A high-level manager dynamically selects among these skills, enabling adaptive control strategies over time. We evaluate MRSD on tasks from the DeepMind Control Suite and show that it outperforms prior state-of-the-art skill discovery and HRL methods, achieving faster convergence and higher final performance. Our findings highlight the benefits of integrating multi-resolution skills in HRL, paving the way for more versatile and efficient agents.
EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Transformer architectures can effectively learn language-conditioned, multi-task 3D open-loop manipulation policies from demonstrations by jointly processing natural language instructions and 3D observations. However, although both the robot policy and language instructions inherently encode rich 3D geometric structures, standard transformers lack built-in guarantees of geometric consistency, often resulting in unpredictable behavior under SE(3) transformations of the scene. In this paper, we leverage SE(3) equivariance as a key structural property shared by both policy and language, and propose EquAct-a novel SE(3)-equivariant multi-task transformer. EquAct is theoretically guaranteed to be SE(3) equivariant and consists of two key components: (1) an efficient SE(3)-equivariant point cloud-based U-net with spherical Fourier features for policy reasoning, and (2) SE(3)-invariant Feature-wise Linear Modulation (iFiLM) layers for language conditioning. To evaluate its spatial generalization ability, we benchmark EquAct on 18 RLBench simulation tasks with both SE(3) and SE(2) scene perturbations, and on 4 physical tasks. EquAct performs state-of-the-art across these simulation and physical tasks.
Structure from Collision CVPR 2025
Recent advancements in neural 3D representations, such as neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS), have enabled the accurate estimation of 3D structures from multiview images. However, this capability is limited to estimating the visible external structure, and identifying the invisible internal structure hidden behind the surface is difficult. To overcome this limitation, we address a new task called Structure from Collision (SfC), which aims to estimate the structure (including the invisible internal structure) of an object from appearance changes during collision. To solve this problem, we propose a novel model called SfC-NeRF that optimizes the invisible internal structure of an object through a video sequence under physical, appearance (i.e., visible external structure)-preserving, and keyframe constraints. In particular, to avoid falling into undesirable local optima owing to its ill-posed nature, we propose volume annealing; that is, searching for global optima by repeatedly reducing and expanding the volume. Extensive experiments on 115 objects involving diverse structures (i.e., various cavity shapes, locations, and sizes) and material properties revealed the properties of SfC and demonstrated the effectiveness of the proposed SfC-NeRF.
comment: Accepted to CVPR 2025 (Highlight). Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/sfc/
EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild
Data-driven navigation algorithms are critically dependent on large-scale, high-quality real-world data collection for successful training and robust performance in realistic and uncontrolled conditions. To enhance the growing family of navigation-related real-world datasets, we introduce EgoWalk - a dataset of 50 hours of human navigation in a diverse set of indoor/outdoor, varied seasons, and location environments. Along with the raw and Imitation Learning-ready data, we introduce several pipelines to automatically create subsidiary datasets for other navigation-related tasks, namely natural language goal annotations and traversability segmentation masks. Diversity studies, use cases, and benchmarks for the proposed dataset are provided to demonstrate its practical applicability. We openly release all data processing pipelines and the description of the hardware platform used for data collection to support future research and development in robot navigation systems.
Collision Probability Estimation for Optimization-based Vehicular Motion Planning
Many motion planning algorithms for automated driving require estimating the probability of collision (POC) to account for uncertainties in the measurement and estimation of the motion of road users. Common POC estimation techniques often utilize sampling-based methods that suffer from computational inefficiency and a non-deterministic estimation, i.e., each estimation result for the same inputs is slightly different. In contrast, optimization-based motion planning algorithms require computationally efficient POC estimation, ideally using deterministic estimation, such that typical optimization algorithms for motion planning retain feasibility. Estimating the POC analytically, however, is challenging because it depends on understanding the collision conditions (e.g., vehicle's shape) and characterizing the uncertainty in motion prediction. In this paper, we propose an approach in which we estimate the POC between two vehicles by over-approximating their shapes by a multi-circular shape approximation. The position and heading of the predicted vehicle are modelled as random variables, contrasting with the literature, where the heading angle is often neglected. We guarantee that the provided POC is an over-approximation, which is essential in providing safety guarantees, and present a computationally efficient algorithm for computing the POC estimate for Gaussian uncertainty in the position and heading. This algorithm is then used in a path-following stochastic model predictive controller (SMPC) for motion planning. With the proposed algorithm, the SMPC generates reproducible trajectories while the controller retains its feasibility in the presented test cases and demonstrates the ability to handle varying levels of uncertainty.
comment: 14 pages, 6 figures
A domain adaptation neural network for digital twin-supported fault diagnosis
Digital twins offer a promising solution to the lack of sufficient labeled data in deep learning-based fault diagnosis by generating simulated data for model training. However, discrepancies between simulation and real-world systems can lead to a significant drop in performance when models are applied in real scenarios. To address this issue, we propose a fault diagnosis framework based on Domain-Adversarial Neural Networks (DANN), which enables knowledge transfer from simulated (source domain) to real-world (target domain) data. We evaluate the proposed framework using a publicly available robotics fault diagnosis dataset, which includes 3,600 sequences generated by a digital twin model and 90 real sequences collected from physical systems. The DANN method is compared with commonly used lightweight deep learning models such as CNN, TCN, Transformer, and LSTM. Experimental results show that incorporating domain adaptation significantly improves the diagnostic performance. For example, applying DANN to a baseline CNN model improves its accuracy from 70.00% to 80.22% on real-world test data, demonstrating the effectiveness of domain adaptation in bridging the sim-to-real gap.
comment: Preprint accepted by ICCAD 2025 at Barcelona
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction
Turn-taking is richly multimodal. Predictive turn-taking models (PTTMs) facilitate naturalistic human-robot interaction, yet most rely solely on speech. We introduce MM-VAP, a multimodal PTTM which combines speech with visual cues including facial expression, head pose and gaze. We find that it outperforms the state-of-the-art audio-only in videoconferencing interactions (84% vs. 79% hold/shift prediction accuracy). Unlike prior work which aggregates all holds and shifts, we group by duration of silence between turns. This reveals that through the inclusion of visual features, MM-VAP outperforms a state-of-the-art audio-only turn-taking model across all durations of speaker transitions. We conduct a detailed ablation study, which reveals that facial expression features contribute the most to model performance. Thus, our working hypothesis is that when interlocutors can see one another, visual cues are vital for turn-taking and must therefore be included for accurate turn-taking prediction. We additionally validate the suitability of automatic speech alignment for PTTM training using telephone speech. This work represents the first comprehensive analysis of multimodal PTTMs. We discuss implications for future work and make all code publicly available.
RefAV: Towards Planning-Centric Scenario Mining
Autonomous Vehicles (AVs) collect and pseudo-label terabytes of multi-modal data localized to HD maps during normal fleet testing. However, identifying interesting and safety-critical scenarios from uncurated driving logs remains a significant challenge. Traditional scenario mining techniques are error-prone and prohibitively time-consuming, often relying on hand-crafted structured queries. In this work, we revisit spatio-temporal scenario mining through the lens of recent vision-language models (VLMs) to detect whether a described scenario occurs in a driving log and, if so, precisely localize it in both time and space. To address this problem, we introduce RefAV, a large-scale dataset of 10,000 diverse natural language queries that describe complex multi-agent interactions relevant to motion planning derived from 1000 driving logs in the Argoverse 2 Sensor dataset. We evaluate several referential multi-object trackers and present an empirical analysis of our baselines. Notably, we find that naively repurposing off-the-shelf VLMs yields poor performance, suggesting that scenario mining presents unique challenges. Our code and dataset are available at https://github.com/CainanD/RefAV/ and https://argoverse.github.io/user-guide/tasks/scenario_mining.html
SCALOFT: An Initial Approach for Situation Coverage-Based Safety Analysis of an Autonomous Aerial Drone in a Mine Environment
The safety of autonomous systems in dynamic and hazardous environments poses significant challenges. This paper presents a testing approach named SCALOFT for systematically assessing the safety of an autonomous aerial drone in a mine. SCALOFT provides a framework for developing diverse test cases, real-time monitoring of system behaviour, and detection of safety violations. Detected violations are then logged with unique identifiers for detailed analysis and future improvement. SCALOFT helps build a safety argument by monitoring situation coverage and calculating a final coverage measure. We have evaluated the performance of this approach by deliberately introducing seeded faults into the system and assessing whether SCALOFT is able to detect those faults. For a small set of plausible faults, we show that SCALOFT is successful in this.
Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning
Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by psychological theories suggesting that humans process scenes in an object-based fashion, we propose an object-centric encoder that performs semantic segmentation and visual representation generation in a coupled manner, unlike other works, which treat these as separate processes. To achieve this, we leverage the Slot Attention mechanism and use the SOLV model, pretrained in large out-of-domain datasets, to bootstrap fine-tuning on human action video data. Through simulated robotic tasks, we demonstrate that visual representations can enhance reinforcement and imitation learning training, highlighting the effectiveness of our integrated approach for semantic segmentation and encoding. Furthermore, we show that exploiting models pretrained on out-of-domain datasets can benefit this process, and that fine-tuning on datasets depicting human actions -- although still out-of-domain -- , can significantly improve performance due to close alignment with robotic tasks. These findings show the capability to reduce reliance on annotated or robot-specific action datasets and the potential to build on existing visual encoders to accelerate training and improve generalizability.
COM Adjustment Mechanism Control for Multi-Configuration Motion Stability of Unmanned Deformable Vehicle
An unmanned deformable vehicle is a wheel-legged robot transforming between two configurations: vehicular and humanoid states, with different motion modes and stability characteristics. To address motion stability in multiple configurations, a center-of-mass adjustment mechanism was designed. Further, a motion stability hierarchical control algorithm was proposed, and an electromechanical model based on a two-degree-of-freedom center-of-mass adjustment mechanism was established. An unmanned-deformable-vehicle vehicular-state steady-state steering dynamics model and a gait planning kinematic model of humanoid state walking were established. A stability hierarchical control strategy was designed to realize the stability control. The results showed that the steady-state steering stability in vehicular state and the walking stability in humanoid state could be significantly improved by controlling the slider motion.
HS-SLAM: A Fast and Hybrid Strategy-Based SLAM Approach for Low-Speed Autonomous Driving
Visual-inertial simultaneous localization and mapping (SLAM) is a key module of robotics and low-speed autonomous vehicles, which is usually limited by the high computation burden for practical applications. To this end, an innovative strategy-based hybrid framework HS-SLAM is proposed to integrate the advantages of direct and feature-based methods for fast computation without decreasing the performance. It first estimates the relative positions of consecutive frames using IMU pose estimation within the tracking thread. Then, it refines these estimates through a multi-layer direct method, which progressively corrects the relative pose from coarse to fine, ultimately achieving accurate corner-based feature matching. This approach serves as an alternative to the conventional constant-velocity tracking model. By selectively bypassing descriptor extraction for non-critical frames, HS-SLAM significantly improves the tracking speed. Experimental evaluations on the EuRoC MAV dataset demonstrate that HS-SLAM achieves higher localization accuracies than ORB-SLAM3 while improving the average tracking efficiency by 15%.
comment: 11 pages, 7 figures
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires the agent to navigate by following natural instructions under partial observability, making it difficult to align perception with language. Recent methods mitigate this by imagining future scenes, yet they rely on vision-based synthesis, leading to high computational cost and redundant details. To this end, we propose to adaptively imagine key environmental semantics via \textit{language} form, enabling a more reliable and efficient strategy. Specifically, we introduce a novel Adaptive Text Dreamer (ATD), a dual-branch self-guided imagination policy built upon a large language model (LLM). ATD is designed with a human-like left-right brain architecture, where the left brain focuses on logical integration, and the right brain is responsible for imaginative prediction of future scenes. To achieve this, we fine-tune only the Q-former within both brains to efficiently activate domain-specific knowledge in the LLM, enabling dynamic updates of logical reasoning and imagination during navigation. Furthermore, we introduce a cross-interaction mechanism to regularize the imagined outputs and inject them into a navigation expert module, allowing ATD to jointly exploit both the reasoning capacity of the LLM and the expertise of the navigation model. We conduct extensive experiments on the R2R benchmark, where ATD achieves state-of-the-art performance with fewer parameters. The code is \href{https://github.com/zhangpingrui/Adaptive-Text-Dreamer}{here}.
Generalized Coordination of Partially Cooperative Urban Traffic
Vehicle-to-anything connectivity, especially for autonomous vehicles, promises to increase passenger comfort and safety of road traffic, for example, by sharing perception and driving intention. Cooperative maneuver planning uses connectivity to enhance traffic efficiency, which has, so far, been mainly considered for automated intersection management. In this article, we present a novel cooperative maneuver planning approach that is generalized to various situations found in urban traffic. Our framework handles challenging mixed traffic, that is, traffic comprising both cooperative connected vehicles and other vehicles at any distribution. Our solution is based on an optimization approach accompanied by an efficient heuristic method for high-load scenarios. We extensively evaluate the proposed planer in a distinctly realistic simulation framework and show significant efficiency gains already at a cooperation rate of 40%. Traffic throughput increases, while the average waiting time and the number of stopped vehicles are reduced, without impacting traffic safety.
comment: 8 pages, 7 figures, 3 tables, submitted to IEEE Transactions on Intelligent Vehicles (T-IV)
G-DReaM: Graph-conditioned Diffusion Retargeting across Multiple Embodiments
Motion retargeting for specific robot from existing motion datasets is one critical step in transferring motion patterns from human behaviors to and across various robots. However, inconsistencies in topological structure, geometrical parameters as well as joint correspondence make it difficult to handle diverse embodiments with a unified retargeting architecture. In this work, we propose a novel unified graph-conditioned diffusion-based motion generation framework for retargeting reference motions across diverse embodiments. The intrinsic characteristics of heterogeneous embodiments are represented with graph structure that effectively captures topological and geometrical features of different robots. Such a graph-based encoding further allows for knowledge exploitation at the joint level with a customized attention mechanisms developed in this work. For lacking ground truth motions of the desired embodiment, we utilize an energy-based guidance formulated as retargeting losses to train the diffusion model. As one of the first cross-embodiment motion retargeting methods in robotics, our experiments validate that the proposed model can retarget motions across heterogeneous embodiments in a unified manner. Moreover, it demonstrates a certain degree of generalization to both diverse skeletal structures and similar motion patterns.
Collision-free Control Barrier Functions for General Ellipsoids via Separating Hyperplane
This paper presents a novel collision avoidance method for general ellipsoids based on control barrier functions (CBFs) and separating hyperplanes. First, collision-free conditions for general ellipsoids are analytically derived using the concept of dual cones. These conditions are incorporated into the CBF framework by extending the system dynamics of controlled objects with separating hyperplanes, enabling efficient and reliable collision avoidance. The validity of the proposed collision-free CBFs is rigorously proven, ensuring their effectiveness in enforcing safety constraints. The proposed method requires only single-level optimization, significantly reducing computational time compared to state-of-the-art methods. Numerical simulations and real-world experiments demonstrate the effectiveness and practicality of the proposed algorithm.
Learning Unified Force and Position Control for Legged Loco-Manipulation
Robotic loco-manipulation tasks often involve contact-rich interactions with the environment, requiring the joint modeling of contact force and robot position. However, recent visuomotor policies often focus solely on learning position or force control, overlooking their co-learning. In this work, we propose the first unified policy for legged robots that jointly models force and position control learned without reliance on force sensors. By simulating diverse combinations of position and force commands alongside external disturbance forces, we use reinforcement learning to learn a policy that estimates forces from historical robot states and compensates for them through position and velocity adjustments. This policy enables a wide range of manipulation behaviors under varying force and position inputs, including position tracking, force application, force tracking, and compliant interactions. Furthermore, we demonstrate that the learned policy enhances trajectory-based imitation learning pipelines by incorporating essential contact information through its force estimation module, achieving approximately 39.5% higher success rates across four challenging contact-rich manipulation tasks compared to position-control policies. Extensive experiments on both a quadrupedal manipulator and a humanoid robot validate the versatility and robustness of the proposed policy across diverse scenarios.
comment: website: https://unified-force.github.io/
GET: Goal-directed Exploration and Targeting for Large-Scale Unknown Environments
Object search in large-scale, unstructured environments remains a fundamental challenge in robotics, particularly in dynamic or expansive settings such as outdoor autonomous exploration. This task requires robust spatial reasoning and the ability to leverage prior experiences. While Large Language Models (LLMs) offer strong semantic capabilities, their application in embodied contexts is limited by a grounding gap in spatial reasoning and insufficient mechanisms for memory integration and decision consistency.To address these challenges, we propose GET (Goal-directed Exploration and Targeting), a framework that enhances object search by combining LLM-based reasoning with experience-guided exploration. At its core is DoUT (Diagram of Unified Thought), a reasoning module that facilitates real-time decision-making through a role-based feedback loop, integrating task-specific criteria and external memory. For repeated tasks, GET maintains a probabilistic task map based on a Gaussian Mixture Model, allowing for continual updates to object-location priors as environments evolve.Experiments conducted in real-world, large-scale environments demonstrate that GET improves search efficiency and robustness across multiple LLMs and task settings, significantly outperforming heuristic and LLM-only baselines. These results suggest that structured LLM integration provides a scalable and generalizable approach to embodied decision-making in complex environments.
Spatial RoboGrasp: Generalized Robotic Grasping Control Policy
Achieving generalizable and precise robotic manipulation across diverse environments remains a critical challenge, largely due to limitations in spatial perception. While prior imitation-learning approaches have made progress, their reliance on raw RGB inputs and handcrafted features often leads to overfitting and poor 3D reasoning under varied lighting, occlusion, and object conditions. In this paper, we propose a unified framework that couples robust multimodal perception with reliable grasp prediction. Our architecture fuses domain-randomized augmentation, monocular depth estimation, and a depth-aware 6-DoF Grasp Prompt into a single spatial representation for downstream action planning. Conditioned on this encoding and a high-level task prompt, our diffusion-based policy yields precise action sequences, achieving up to 40% improvement in grasp success and 45% higher task success rates under environmental variation. These results demonstrate that spatially grounded perception, paired with diffusion-based imitation learning, offers a scalable and robust solution for general-purpose robotic grasping.
Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt
Recent robot learning methods commonly rely on imitation learning from massive robotic dataset collected with teleoperation. When facing a new task, such methods generally require collecting a set of new teleoperation data and finetuning the policy. Furthermore, the teleoperation data collection pipeline is also tedious and expensive. Instead, human is able to efficiently learn new tasks by just watching others do. In this paper, we introduce a novel two-stage framework that utilizes human demonstrations to learn a generalizable robot policy. Such policy can directly take human demonstration video as a prompt and perform new tasks without any new teleoperation data and model finetuning at all. In the first stage, we train video generation model that captures a joint representation for both the human and robot demonstration video data using cross-prediction. In the second stage, we fuse the learned representation with a shared action space between human and robot using a novel prototypical contrastive loss. Empirical evaluations on real-world dexterous manipulation tasks show the effectiveness and generalization capabilities of our proposed method.
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation
Path planning is a critical component in autonomous drone operations, enabling safe and efficient navigation through complex environments. Recent advances in foundation models, particularly large language models (LLMs) and vision-language models (VLMs), have opened new opportunities for enhanced perception and intelligent decision-making in robotics. However, their practical applicability and effectiveness in global path planning remain relatively unexplored. This paper proposes foundation model-guided path planners (FM-Planner) and presents a comprehensive benchmarking study and practical validation for drone path planning. Specifically, we first systematically evaluate eight representative LLM and VLM approaches using standardized simulation scenarios. To enable effective real-time navigation, we then design an integrated LLM-Vision planner that combines semantic reasoning with visual perception. Furthermore, we deploy and validate the proposed path planner through real-world experiments under multiple configurations. Our findings provide valuable insights into the strengths, limitations, and feasibility of deploying foundation models in real-world drone applications and providing practical implementations in autonomous flight. Project site: https://github.com/NTU-ICG/FM-Planner.
comment: This work has been submitted for possible publication
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCH-OPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations that make it advantageous for OPE: (1) prevents over-regularization by subtracting the score of the behavior policy during guidance, and (2) generates long-horizon trajectories by stitching partial trajectories together end-to-end. We provide a theoretical guarantee that under mild assumptions, these modifications result in an exponential reduction in variance versus long-horizon trajectory diffusion. Experiments on the D4RL and OpenAI Gym benchmarks show substantial improvement in mean squared error, correlation, and regret metrics compared to state-of-the-art OPE methods.
Interactive OT Gym: A Reinforcement Learning-Based Interactive Optical tweezer (OT)-Driven Microrobotics Simulation Platform ICRA 2025
Optical tweezers (OT) offer unparalleled capabilities for micromanipulation with submicron precision in biomedical applications. However, controlling conventional multi-trap OT to achieve cooperative manipulation of multiple complex-shaped microrobots in dynamic environments poses a significant challenge. To address this, we introduce Interactive OT Gym, a reinforcement learning (RL)-based simulation platform designed for OT-driven microrobotics. Our platform supports complex physical field simulations and integrates haptic feedback interfaces, RL modules, and context-aware shared control strategies tailored for OT-driven microrobot in cooperative biological object manipulation tasks. This integration allows for an adaptive blend of manual and autonomous control, enabling seamless transitions between human input and autonomous operation. We evaluated the effectiveness of our platform using a cell manipulation task. Experimental results show that our shared control system significantly improves micromanipulation performance, reducing task completion time by approximately 67% compared to using pure human or RL control alone and achieving a 100% success rate. With its high fidelity, interactivity, low cost, and high-speed simulation capabilities, Interactive OT Gym serves as a user-friendly training and testing environment for the development of advanced interactive OT-driven micromanipulation systems and control algorithms. For more details on the project, please see our website https://sites.google.com/view/otgym
comment: ICRA 2025
ManiTaskGen: A Comprehensive Task Generator for Benchmarking and Improving Vision-Language Agents on Embodied Decision-Making
Building embodied agents capable of accomplishing arbitrary tasks is a core objective towards achieving embodied artificial general intelligence (E-AGI). While recent work has advanced such general robot policies, their training and evaluation are often limited to tasks within specific scenes, involving restricted instructions and scenarios. Existing benchmarks also typically rely on manual annotation of limited tasks in a few scenes. We argue that exploring the full spectrum of feasible tasks within any given scene is crucial, as they provide both extensive benchmarks for evaluation and valuable resources for agent improvement. Towards this end, we introduce ManiTaskGen, a novel system that automatically generates comprehensive, diverse, feasible mobile manipulation tasks for any given scene. The generated tasks encompass both process-based, specific instructions (e.g., "move object from X to Y") and outcome-based, abstract instructions (e.g., "clear the table"). We apply ManiTaskGen to both simulated and real-world scenes, demonstrating the validity and diversity of the generated tasks. We then leverage these tasks to automatically construct benchmarks, thoroughly evaluating the embodied decision-making capabilities of agents built upon existing vision-language models (VLMs). Furthermore, we propose a simple yet effective method that utilizes ManiTaskGen tasks to enhance embodied decision-making. Overall, this work presents a universal task generation framework for arbitrary scenes, facilitating both benchmarking and improvement of embodied decision-making agents.
Automating eHMI Action Design with LLMs for Automated Vehicle Communication
The absence of explicit communication channels between automated vehicles (AVs) and other road users requires the use of external Human-Machine Interfaces (eHMIs) to convey messages effectively in uncertain scenarios. Currently, most eHMI studies employ predefined text messages and manually designed actions to perform these messages, which limits the real-world deployment of eHMIs, where adaptability in dynamic scenarios is essential. Given the generalizability and versatility of large language models (LLMs), they could potentially serve as automated action designers for the message-action design task. To validate this idea, we make three contributions: (1) We propose a pipeline that integrates LLMs and 3D renderers, using LLMs as action designers to generate executable actions for controlling eHMIs and rendering action clips. (2) We collect a user-rated Action-Design Scoring dataset comprising a total of 320 action sequences for eight intended messages and four representative eHMI modalities. The dataset validates that LLMs can translate intended messages into actions close to a human level, particularly for reasoning-enabled LLMs. (3) We introduce two automated raters, Action Reference Score (ARS) and Vision-Language Models (VLMs), to benchmark 18 LLMs, finding that the VLM aligns with human preferences yet varies across eHMI modalities.
Gait-Conditioned Reinforcement Learning with Multi-Phase Curriculum for Humanoid Locomotion
We present a unified gait-conditioned reinforcement learning framework that enables humanoid robots to perform standing, walking, running, and smooth transitions within a single recurrent policy. A compact reward routing mechanism dynamically activates gait-specific objectives based on a one-hot gait ID, mitigating reward interference and supporting stable multi-gait learning. Human-inspired reward terms promote biomechanically natural motions, such as straight-knee stance and coordinated arm-leg swing, without requiring motion capture data. A structured curriculum progressively introduces gait complexity and expands command space over multiple phases. In simulation, the policy successfully achieves robust standing, walking, running, and gait transitions. On the real Unitree G1 humanoid, we validate standing, walking, and walk-to-stand transitions, demonstrating stable and coordinated locomotion. This work provides a scalable, reference-free solution toward versatile and naturalistic humanoid control across diverse modes and environments.
OmniIndoor3D: Comprehensive Indoor 3D Reconstruction
We propose a novel framework for comprehensive indoor 3D reconstruction using Gaussian representations, called OmniIndoor3D. This framework enables accurate appearance, geometry, and panoptic reconstruction of diverse indoor scenes captured by a consumer-level RGB-D camera. Since 3DGS is primarily optimized for photorealistic rendering, it lacks the precise geometry critical for high-quality panoptic reconstruction. Therefore, OmniIndoor3D first combines multiple RGB-D images to create a coarse 3D reconstruction, which is then used to initialize the 3D Gaussians and guide the 3DGS training. To decouple the optimization conflict between appearance and geometry, we introduce a lightweight MLP that adjusts the geometric properties of 3D Gaussians. The introduced lightweight MLP serves as a low-pass filter for geometry reconstruction and significantly reduces noise in indoor scenes. To improve the distribution of Gaussian primitives, we propose a densification strategy guided by panoptic priors to encourage smoothness on planar surfaces. Through the joint optimization of appearance, geometry, and panoptic reconstruction, OmniIndoor3D provides comprehensive 3D indoor scene understanding, which facilitates accurate and robust robotic navigation. We perform thorough evaluations across multiple datasets, and OmniIndoor3D achieves state-of-the-art results in appearance, geometry, and panoptic reconstruction. We believe our work bridges a critical gap in indoor 3D reconstruction. The code will be released at: https://ucwxb.github.io/OmniIndoor3D/
Visual Loop Closure Detection Through Deep Graph Consensus
Visual loop closure detection traditionally relies on place recognition methods to retrieve candidate loops that are validated using computationally expensive RANSAC-based geometric verification. As false positive loop closures significantly degrade downstream pose graph estimates, verifying a large number of candidates in online simultaneous localization and mapping scenarios is constrained by limited time and compute resources. While most deep loop closure detection approaches only operate on pairs of keyframes, we relax this constraint by considering neighborhoods of multiple keyframes when detecting loops. In this work, we introduce LoopGNN, a graph neural network architecture that estimates loop closure consensus by leveraging cliques of visually similar keyframes retrieved through place recognition. By propagating deep feature encodings among nodes of the clique, our method yields high-precision estimates while maintaining high recall. Extensive experimental evaluations on the TartanDrive 2.0 and NCLT datasets demonstrate that LoopGNN outperforms traditional baselines. Additionally, an ablation study across various keypoint extractors demonstrates that our method is robust, regardless of the type of deep feature encodings used, and exhibits higher computational efficiency compared to classical geometric verification baselines. We release our code, supplementary material, and keyframe data at https://loopgnn.cs.uni-freiburg.de.
MIND-Stack: Modular, Interpretable, End-to-End Differentiability for Autonomous Navigation
Developing robust, efficient navigation algorithms is challenging. Rule-based methods offer interpretability and modularity but struggle with learning from large datasets, while end-to-end neural networks excel in learning but lack transparency and modularity. In this paper, we present MIND-Stack, a modular software stack consisting of a localization network and a Stanley Controller with intermediate human interpretable state representations and end-to-end differentiability. Our approach enables the upstream localization module to reduce the downstream control error, extending its role beyond state estimation. Unlike existing research on differentiable algorithms that either lack modules of the autonomous stack to span from sensor input to actuator output or real-world implementation, MIND-Stack offers both capabilities. We conduct experiments that demonstrate the ability of the localization module to reduce the downstream control loss through its end-to-end differentiability while offering better performance than state-of-the-art algorithms. We showcase sim-to-real capabilities by deploying the algorithm on a real-world embedded autonomous platform with limited computation power and demonstrate simultaneous training of both the localization and controller towards one goal. While MIND-Stack shows good results, we discuss the incorporation of additional modules from the autonomous navigation pipeline in the future, promising even greater stability and performance in the next iterations of the framework.
comment: 8 pages. Submitted to the IEEE Intelligent Vehicles Symposium (IV 2025), Romania
Real-World Deployment of Cloud Autonomous Mobility System Using 5G Networks for Outdoor and Indoor Environments
The growing complexity of both outdoor and indoor mobility systems demands scalable, cost-effective, and reliable perception and communication frameworks. This work presents the real-world deployment and evaluation of a Cloud Autonomous Mobility (CAM) system that leverages distributed sensor nodes connected via 5G networks, which integrates LiDAR- and camera-based perception at infrastructure units, cloud computing for global information fusion, and Ultra-Reliable Low Latency Communications (URLLC) to enable real-time situational awareness and autonomous operation. The CAM system is deployed in two distinct environments: a dense urban roundabout and a narrow indoor hospital corridor. Field experiments show improved traffic monitoring, hazard detection, and asset management capabilities. The paper also discusses practical deployment challenges and shares key insights for scaling CAM systems. The results highlight the potential of cloud-based infrastructure perception to advance both outdoor and indoor intelligent transportation systems.
comment: This paper has been submitted to IEEE Intelligent Transportation Systems Magazine
Convergent Functions, Divergent Forms
We introduce LOKI, a compute-efficient framework for co-designing morphologies and control policies that generalize across unseen tasks. Inspired by biological adaptation -- where animals quickly adjust to morphological changes -- our method overcomes the inefficiencies of traditional evolutionary and quality-diversity algorithms. We propose learning convergent functions: shared control policies trained across clusters of morphologically similar designs in a learned latent space, drastically reducing the training cost per design. Simultaneously, we promote divergent forms by replacing mutation with dynamic local search, enabling broader exploration and preventing premature convergence. The policy reuse allows us to explore 780$\times$ more designs using 78% fewer simulation steps and 40% less compute per design. Local competition paired with a broader search results in a diverse set of high-performing final morphologies. Using the UNIMAL design space and a flat-terrain locomotion task, LOKI discovers a rich variety of designs -- ranging from quadrupeds to crabs, bipedals, and spinners -- far more diverse than those produced by prior work. These morphologies also transfer better to unseen downstream tasks in agility, stability, and manipulation domains (e.g., 2$\times$ higher reward on bump and push box incline tasks). Overall, our approach produces designs that are both diverse and adaptable, with substantially greater sample efficiency than existing co-design methods. (Project website: https://loki-codesign.github.io/)
PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation
Fine-grained robot manipulation, such as lifting and rotating a bottle to display the label on the cap, requires robust reasoning about object parts and their relationships with intended tasks. Despite recent advances in training general-purpose robot manipulation policies guided by language instructions, there is a notable lack of large-scale datasets for fine-grained manipulation tasks with part-level instructions and diverse 3D object instances annotated with part-level labels. In this work, we introduce PartInstruct, the first large-scale benchmark for training and evaluating fine-grained robot manipulation models using part-level instructions. PartInstruct comprises 513 object instances across 14 categories, each annotated with part-level information, and 1302 fine-grained manipulation tasks organized into 16 task classes. Our training set consists of over 10,000 expert demonstrations synthesized in a 3D simulator, where each demonstration is paired with a high-level task instruction, a chain of base part-based skill instructions, and ground-truth 3D information about the object and its parts. Additionally, we designed a comprehensive test suite to evaluate the generalizability of learned policies across new states, objects, and tasks. We evaluated several state-of-the-art robot manipulation approaches, including end-to-end vision-language policy learning and bi-level planning models for robot manipulation on our benchmark. The experimental results reveal that current models struggle to robustly ground part concepts and predict actions in 3D space, and face challenges when manipulating object parts in long-horizon tasks.
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits
Large Language Models (LLMs) enable various applications on edge devices such as smartphones, wearables, and embodied robots. However, their deployment often depends on expensive cloud-based APIs, creating high operational costs, which limit access for smaller organizations and raise sustainability concerns. Certain LLMs can be deployed on-device, offering a cost-effective solution with reduced latency and improved privacy. Yet, limited computing resources constrain the size and accuracy of models that can be deployed, necessitating a collaborative design between edge and cloud. We propose a fast and cost-effective speculative edge-cloud decoding framework with a large target model on the server and a small draft model on the device. By introducing early exits in the target model, tokens are generated mid-verification, allowing the client to preemptively draft subsequent tokens before final verification, thus utilizing idle time and enhancing parallelism between edge and cloud. Using an NVIDIA Jetson Nano (client) and an A100 GPU (server) with Vicuna-68M (draft) and Llama2-7B (target) models, our method achieves up to a 35% reduction in latency compared to cloud-based autoregressive decoding, with an additional 11% improvement from preemptive drafting. To demonstrate real-world applicability, we deploy our method on the Unitree Go2 quadruped robot using Vision-Language Model (VLM) based control, achieving a 21% speedup over traditional cloud-based autoregressive decoding. These results demonstrate the potential of our framework for real-time LLM and VLM applications on resource-constrained edge devices.
CogAD: Cognitive-Hierarchy Guided End-to-End Autonomous Driving
While end-to-end autonomous driving has advanced significantly, prevailing methods remain fundamentally misaligned with human cognitive principles in both perception and planning. In this paper, we propose CogAD, a novel end-to-end autonomous driving model that emulates the hierarchical cognition mechanisms of human drivers. CogAD implements dual hierarchical mechanisms: global-to-local context processing for human-like perception and intent-conditioned multi-mode trajectory generation for cognitively-inspired planning. The proposed method demonstrates three principal advantages: comprehensive environmental understanding through hierarchical perception, robust planning exploration enabled by multi-level planning, and diverse yet reasonable multi-modal trajectory generation facilitated by dual-level uncertainty modeling. Extensive experiments on nuScenes and Bench2Drive demonstrate that CogAD achieves state-of-the-art performance in end-to-end planning, exhibiting particular superiority in long-tail scenarios and robust generalization to complex real-world driving conditions.
Towards Human-Like Trajectory Prediction for Autonomous Driving: A Behavior-Centric Approach
Predicting the trajectories of vehicles is crucial for the development of autonomous driving (AD) systems, particularly in complex and dynamic traffic environments. In this study, we introduce HiT (Human-like Trajectory Prediction), a novel model designed to enhance trajectory prediction by incorporating behavior-aware modules and dynamic centrality measures. Unlike traditional methods that primarily rely on static graph structures, HiT leverages a dynamic framework that accounts for both direct and indirect interactions among traffic participants. This allows the model to capture the subtle yet significant influences of surrounding vehicles, enabling more accurate and human-like predictions. To evaluate HiT's performance, we conducted extensive experiments using diverse and challenging real-world datasets, including NGSIM, HighD, RounD, ApolloScape, and MoCAD++. The results demonstrate that HiT consistently outperforms other top models across multiple metrics, particularly excelling in scenarios involving aggressive driving behaviors. This research presents a significant step forward in trajectory prediction, offering a more reliable and interpretable approach for enhancing the safety and efficiency of fully autonomous driving systems.
Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees
In many real-world applications of regression, conditional probability estimation, and uncertainty quantification, exploiting symmetries rooted in physics or geometry can dramatically improve generalization and sample efficiency. While geometric deep learning has made significant empirical advances by incorporating group-theoretic structure, less attention has been given to statistical learning guarantees. In this paper, we introduce an equivariant representation learning framework that simultaneously addresses regression, conditional probability estimation, and uncertainty quantification while providing first-of-its-kind non-asymptotic statistical learning guarantees. Grounded in operator and group representation theory, our framework approximates the spectral decomposition of the conditional expectation operator, building representations that are both equivariant and disentangled along independent symmetry subgroups. Empirical evaluations on synthetic datasets and real-world robotics applications confirm the potential of our approach, matching or outperforming existing equivariant baselines in regression while additionally providing well-calibrated parametric uncertainty estimates.
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i.e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal. Despite the great potential of the end-to-end paradigm, existing methods suffer from several aspects including expensive BEV (bird's eye view) computation, action diversity, and sub-optimal decision in complex real-world scenarios. To address these challenges, we propose a novel hybrid sparse-dense diffusion policy, empowered by a Vision-Language Model (VLM), called Diff-VLA. We explore the sparse diffusion representation for efficient multi-modal driving behavior. Moreover, we rethink the effectiveness of VLM driving decision and improve the trajectory generation guidance through deep interaction across agent, map instances and VLM output. Our method shows superior performance in Autonomous Grand Challenge 2025 which contains challenging real and reactive synthetic scenarios. Our methods achieves 45.0 PDMS.
comment: 4pages
Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.
comment: 9 pages, 8 figures, 5 tables
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard
Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments. We introduce a unified Human-Aware VLN (HA-VLN) benchmark that merges these paradigms under explicit social-awareness constraints. Our contributions include: 1. A standardized task definition that balances discrete-continuous navigation with personal-space requirements; 2. An enhanced human motion dataset (HAPS 2.0) and upgraded simulators capturing realistic multi-human interactions, outdoor contexts, and refined motion-language alignment; 3. Extensive benchmarking on 16,844 human-centric instructions, revealing how multi-human dynamics and partial observability pose substantial challenges for leading VLN agents; 4. Real-world robot tests validating sim-to-real transfer in crowded indoor spaces; and 5. A public leaderboard supporting transparent comparisons across discrete and continuous tasks. Empirical results show improved navigation success and fewer collisions when social context is integrated, underscoring the need for human-centric design. By releasing all datasets, simulators, agent code, and evaluation tools, we aim to advance safer, more capable, and socially responsible VLN research.
comment: 27 pages, 21 figures, with added experiments and analysis, website: https://ha-vln-project.vercel.app/
A Universal Flexible Neuromorphic Tactile System with Multithreshold Strategy
Extremely increased unstructured data brought by the large-scale intelligent sensing devices application have big challenges not only in data storing and processing but also power consumption surging. Therefore, to improve energy efficiency and processing speed, a new generation system structure and construction strategy is necessary. Most biological nervous systems, especially the tactile system, have a good flexibility and data processing performance with low power usage. Inspired from this mechanism, to optimize the intelligent system, we report a universal fully flexible neuromorphic perception system with a strong compatibility and multi-threshold signal processing strategy by mimicking tactile nervous system. Peak signal accumulated from spike encoded sensor signal in front-end processing unit can be used for recognition task directly since the bionic synaptic plasticity. Compared with conventional systems, power consumption of our system significantly decreases about 1 order of magnitude in a same recognition task. What is more, the design of voltage-based matching circuit and multithreshold processing circuit provide an excellent compatibility and multi-signal processing capability in our system. In feasibility verification, our system can output trend of different input signals (continuous signal and frequency signal etc.) accurately and have a high recognition accuracy of 90% in the symbol pattern and 90% in Morse code. These properties of our neuromorphic system show a great application potential in intelligent devices and bionic robots.
Diffusion Predictive Control with Constraints
Diffusion models have become popular for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are stochastic and typically trained offline, limiting their ability to handle unseen and dynamic conditions where novel constraints not represented in the training data must be satisfied. To overcome this limitation, we propose diffusion predictive control with constraints (DPCC), an algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. DPCC incorporates model-based projections into the denoising process of a trained trajectory diffusion model and uses constraint tightening to account for model mismatch. This allows us to generate constraint-satisfying, dynamically feasible, and goal-reaching trajectories for predictive control. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
comment: Accepted to L4DC 2025. Code: https://github.com/ralfroemer99/dpcc. 14 pages, 3 figures, 3 tables
DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate $100\%$ success rate (vs $82\%$ baseline) and $73$-step average episode length (vs $158$-step baseline). The method also generalizes to momentum-based control tasks and requires only $\log N$ steps for replanning. Theoretical analysis and ablations validate our design choices.
Joint Magnetometer-IMU Calibration via Maximum A Posteriori Estimation
This paper presents a new approach for jointly calibrating magnetometers and inertial measurement units, focusing on improving calibration accuracy and computational efficiency. The proposed method formulates the calibration problem as a maximum a posteriori estimation problem, treating both the calibration parameters and orientation trajectory of the sensors as unknowns. This formulation enables efficient optimization with closed-form derivatives. The method is compared against two state-of-the-art approaches in terms of computational complexity and estimation accuracy. Simulation results demonstrate that the proposed method achieves lower root mean square error in calibration parameters while maintaining competitive computational efficiency. Further validation through real-world experiments confirms the practical benefits of our approach: it effectively reduces position drift in a magnetic field-aided inertial navigation system by more than a factor of two on most datasets. Moreover, the proposed method calibrated 30 magnetometers in less than 2 minutes. The contributions include a new calibration method, an analysis of existing methods, and a comprehensive empirical evaluation. Datasets and algorithms are made publicly available to promote reproducible research.
comment: Fix a typo
FlashBack: Consistency Model-Accelerated Shared Autonomy
Shared autonomy is an enabling technology that provides users with control authority over robots that would otherwise be difficult if not impossible to directly control. Yet, standard methods make assumptions that limit their adoption in practice-for example, prior knowledge of the user's goals or the objective (i.e., reward) function that they wish to optimize, knowledge of the user's policy, or query-level access to the user during training. Diffusion-based approaches to shared autonomy do not make such assumptions and instead only require access to demonstrations of desired behaviors, while allowing the user to maintain control authority. However, these advantages have come at the expense of high computational complexity, which has made real-time shared autonomy all but impossible. To overcome this limitation, we propose Consistency Shared Autonomy (CSA), a shared autonomy framework that employs a consistency model-based formulation of diffusion. Key to CSA is that it employs the distilled probability flow of ordinary differential equations (PF ODE) to generate high-fidelity samples in a single step. This results in inference speeds significantly than what is possible with previous diffusion-based approaches to shared autonomy, enabling real-time assistance in complex domains with only a single function evaluation. Further, by intervening on flawed actions at intermediate states of the PF ODE, CSA enables varying levels of assistance. We evaluate CSA on a variety of challenging simulated and real-world robot control problems, demonstrating significant improvements over state-of-the-art methods both in terms of task performance and computational efficiency.
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
Safe and feasible trajectory planning is essential for real-world autonomous driving systems. However, existing learning-based planning methods often rely on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting unsafe behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task, guided by explicit planning principles such as safety, comfort, and traffic rule compliance. In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data. In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization (GRPO), a reinforcement learning strategy, to align its predictions with these planning principles. Experiments on the nuPlan benchmark demonstrate that our Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance. Our code will be made public soon.
Sky-Drive: A Distributed Multi-Agent Simulation Platform for Human-AI Collaborative and Socially-Aware Future Transportation
Recent advances in autonomous system simulation platforms have significantly enhanced the safe and scalable testing of driving policies. However, existing simulators do not yet fully meet the needs of future transportation research-particularly in enabling effective human-AI collaboration and modeling socially-aware driving agents. This paper introduces Sky-Drive, a novel distributed multi-agent simulation platform that addresses these limitations through four key innovations: (a) a distributed architecture for synchronized simulation across multiple terminals; (b) a multi-modal human-in-the-loop framework integrating diverse sensors to collect rich behavioral data; (c) a human-AI collaboration mechanism supporting continuous and adaptive knowledge exchange; and (d) a digital twin framework for constructing high-fidelity virtual replicas of real-world transportation environments. Sky-Drive supports diverse applications such as autonomous vehicle-human road users interaction modeling, human-in-the-loop training, socially-aware reinforcement learning, personalized driving development, and customized scenario generation. Future extensions will incorporate foundation models for context-aware decision support and hardware-in-the-loop testing for real-world validation. By bridging scenario generation, data collection, algorithm training, and hardware integration, Sky-Drive has the potential to become a foundational platform for the next generation of human-centered and socially-aware autonomous transportation systems research. The demo video and code are available at:https://sky-lab-uw.github.io/Sky-Drive-website/
comment: 14 pages, 7 figures
Predicate Invention for Bilevel Planning AAAI 2023
Efficient planning in continuous state and action spaces is fundamentally hard, even when the transition model is deterministic and known. One way to alleviate this challenge is to perform bilevel planning with abstractions, where a high-level search for abstract plans is used to guide planning in the original transition space. Previous work has shown that when state abstractions in the form of symbolic predicates are hand-designed, operators and samplers for bilevel planning can be learned from demonstrations. In this work, we propose an algorithm for learning predicates from demonstrations, eliminating the need for manually specified state abstractions. Our key idea is to learn predicates by optimizing a surrogate objective that is tractable but faithful to our real efficient-planning objective. We use this surrogate objective in a hill-climbing search over predicate sets drawn from a grammar. Experimentally, we show across four robotic planning environments that our learned abstractions are able to quickly solve held-out tasks, outperforming six baselines. Code: https://tinyurl.com/predicators-release
comment: AAAI 2023. This version fixes pseudocode in Algorithm 2 (thanks Yichao Liang!)
A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation
We present an integrated (or end-to-end) framework for the Real2Sim2Real problem of manipulating deformable linear objects (DLOs) based on visual perception. Working with a parameterised set of DLOs, we use likelihood-free inference (LFI) to compute the posterior distributions for the physical parameters using which we can approximately simulate the behaviour of each specific DLO. We use these posteriors for domain randomisation while training, in simulation, object-specific visuomotor policies (i.e. assuming only visual and proprioceptive sensory) for a DLO reaching task, using model-free reinforcement learning. We demonstrate the utility of this approach by deploying sim-trained DLO manipulation policies in the real world in a zero-shot manner, i.e. without any further fine-tuning. In this context, we evaluate the capacity of a prominent LFI method to perform fine classification over the parametric set of DLOs, using only visual and proprioceptive data obtained in a dynamic manipulation trajectory. We then study the implications of the resulting domain distributions in sim-based policy learning and real-world performance.
CoBOS: Constraint-Based Online Scheduler for Human-Robot Collaboration IROS 2024
Assembly processes involving humans and robots are challenging scenarios because the individual activities and access to shared workspace have to be coordinated. Fixed robot programs leave no room to diverge from a fixed protocol. Working on such a process can be stressful for the user and lead to ineffective behavior or failure. We propose a novel approach of online constraint-based scheduling in a reactive execution control framework facilitating behavior trees called CoBOS. This allows the robot to adapt to uncertain events such as delayed activity completions and activity selection (by the human). The user will experience less stress as the robotic coworkers adapt their behavior to best complement the human-selected activities to complete the common task. In addition to the improved working conditions, our algorithm leads to increased efficiency, even in highly uncertain scenarios. We evaluate our algorithm using a probabilistic simulation study with 56000 experiments. We outperform all other compared methods by a margin of 4-10%. Initial real robot experiments using a Franka Emika Panda robot and human tracking based on HTC Vive VR gloves look promising.
comment: 7 pages, 8 figures, accepted at IEEE IROS 2024
Human-Centered Development of Guide Dog Robots: Quiet and Stable Locomotion Control
A quadruped robot is a promising system that can offer assistance comparable to that of dog guides due to its similar form factor. However, various challenges remain in making these robots a reliable option for blind and low-vision (BLV) individuals. Among these challenges, noise and jerky motion during walking are critical drawbacks of existing quadruped robots. While these issues have largely been overlooked in guide dog robot research, our interviews with guide dog handlers and trainers revealed that acoustic and physical disturbances can be particularly disruptive for BLV individuals, who rely heavily on environmental sounds for navigation. To address these issues, we developed a novel walking controller for slow stepping and smooth foot swing/contact while maintaining human walking speed, as well as robust and stable balance control. The controller integrates with a perception system to facilitate locomotion over non-flat terrains, such as stairs. Our controller was extensively tested on the Unitree Go1 robot and, when compared with other control methods, demonstrated significant noise reduction -- half of the default locomotion controller. In this study, we adopt a mixed-methods approach to evaluate its usability with BLV individuals. In our indoor walking experiments, participants compared our controller to the robot's default controller. Results demonstrated superior acceptance of our controller, highlighting its potential to improve the user experience of guide dog robots. Video demonstration (best viewed with audio) available at: https://youtu.be/8-pz_8Hqe6s.
Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts
While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, visuospatial cognition - reasoning about spatial layouts, relations, and dynamics - remains a significant challenge. Existing models often lack the necessary architectural components and specialized training data for fine-grained spatial understanding. We introduce ViCA2 (Visuospatial Cognitive Assistant 2), a novel MLLM designed to enhance spatial reasoning. ViCA2 features a dual vision encoder architecture integrating SigLIP for semantics and Hiera for spatial structure, coupled with a token ratio control mechanism for efficiency. We also developed ViCA-322K, a new large-scale dataset with over 322,000 spatially grounded question-answer pairs for targeted instruction tuning. On the challenging VSI-Bench benchmark, our ViCA2-7B model achieves a state-of-the-art average score of 56.8, significantly surpassing larger open-source models (e.g., LLaVA-NeXT-Video-72B, 40.9) and leading proprietary models (Gemini-1.5 Pro, 45.4). This demonstrates the effectiveness of our approach in achieving strong visuospatial intelligence with a compact model. We release ViCA2, its codebase, and the ViCA-322K dataset to facilitate further research.
comment: 26 pages, 19 figures, 4 tables. Code, models, and datasets are available at our project page: https://github.com/nkkbr/ViCA. This is a draft technical report. At the request of Professor Hidetoshi Shimodaira, his name has been removed from the author list
Visuospatial Cognitive Assistant
Video-based spatial cognition is vital for robotics and embodied AI but challenges current Vision-Language Models (VLMs). This paper makes two key contributions. First, we introduce ViCA (Visuospatial Cognitive Assistant)-322K, a diverse dataset of 322,003 QA pairs from real-world indoor videos (ARKitScenes, ScanNet, ScanNet++), offering supervision for 3D metadata-grounded queries and video-based complex reasoning. Second, we develop ViCA-7B, fine-tuned on ViCA-322K, which achieves new state-of-the-art on all eight VSI-Bench tasks, outperforming existing models, including larger ones (e.g., +26.1 on Absolute Distance). For interpretability, we present ViCA-Thinking-2.68K, a dataset with explicit reasoning chains, and fine-tune ViCA-7B to create ViCA-7B-Thinking, a model that articulates its spatial reasoning. Our work highlights the importance of targeted data and suggests paths for improved temporal-spatial modeling. We release all resources to foster research in robust visuospatial intelligence.
comment: 31 pages, 10 figures, 6 tables. The implementation and fine-tuned model (ViCA-7B), along with detailed documentation, are publicly available at https://huggingface.co/nkkbr/ViCA. This is a draft technical report. At Professor Hidetoshi Shimodaira's request, his name has been removed from the author list
Efficient Robotic Policy Learning via Latent Space Backward Planning ICML 2025
Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Latent Space Backward Planning scheme (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io
comment: Accepted by ICML 2025
Drones Guiding Drones: Cooperative Navigation of a Less-Equipped Micro Aerial Vehicle in Cluttered Environments IROS 2024
Reliable deployment of Unmanned Aerial Vehicles (UAVs) in cluttered unknown environments requires accurate sensors for Global Navigation Satellite System (GNSS)-denied localization and obstacle avoidance. Such a requirement limits the usage of cheap and micro-scale vehicles with constrained payload capacity if industrial-grade reliability and precision are required. This paper investigates the possibility of offloading the necessity to carry heavy sensors to another member of the UAV team while preserving the desired capability of the smaller robot intended for exploring narrow passages. A novel cooperative guidance framework offloading the sensing requirements from a minimalistic secondary UAV to a superior primary UAV is proposed. The primary UAV constructs a dense occupancy map of the environment and plans collision-free paths for both UAVs to ensure reaching the desired secondary UAV's goals even in areas not accessible by the bigger robot. The primary UAV guides the secondary UAV to follow the planned path while tracking the UAV using Light Detection and Ranging (LiDAR)-based relative localization. The proposed approach was verified in real-world experiments with a heterogeneous team of a 3D LiDAR-equipped primary UAV and a micro-scale camera-equipped secondary UAV moving autonomously through unknown cluttered GNSS-denied environments with the proposed framework running fully on board the UAVs.
comment: presented at IROS 2024, accepted version
Conversational Code Generation: a Case Study of Designing a Dialogue System for Generating Driving Scenarios for Testing Autonomous Vehicles
Cyber-physical systems like autonomous vehicles are tested in simulation before deployment, using domain-specific programs for scenario specification. To aid the testing of autonomous vehicles in simulation, we design a natural language interface, using an instruction-following large language model, to assist a non-coding domain expert in synthesising the desired scenarios and vehicle behaviours. We show that using it to convert utterances to the symbolic program is feasible, despite the very small training dataset. Human experiments show that dialogue is critical to successful simulation generation, leading to a 4.5 times higher success rate than a generation without engaging in extended conversation.
comment: 12 pages, 5 figures, 2 tables
Toward Unified Practices in Trajectory Prediction Research on Bird's-Eye-View Datasets
The availability of high-quality datasets is crucial for the development of behavior prediction algorithms in autonomous vehicles. This paper highlights the need to standardize the use of certain datasets for motion forecasting research to simplify comparative analysis and proposes a set of tools and practices to achieve this. Drawing on extensive experience and a comprehensive review of current literature, we summarize our proposals for preprocessing, visualization, and evaluation in the form of an open-sourced toolbox designed for researchers working on trajectory prediction problems. The clear specification of necessary preprocessing steps and evaluation metrics is intended to alleviate development efforts and facilitate the comparison of results across different studies. The toolbox is available at: https://github.com/westny/dronalize.
comment: https://github.com/westny/dronalize
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning ICRA 2025
This paper addresses the inherent inference latency challenges associated with deploying multimodal large language models (MLLM) in quadruped vision-language-action (QUAR-VLA) tasks. Our investigation reveals that conventional parameter reduction techniques ultimately impair the performance of the language foundation model during the action instruction tuning phase, making them unsuitable for this purpose. We introduce a novel latency-free quadruped MLLM model, dubbed QUART-Online, designed to enhance inference efficiency without degrading the performance of the language foundation model. By incorporating Action Chunk Discretization (ACD), we compress the original action representation space, mapping continuous action values onto a smaller set of discrete representative vectors while preserving critical information. Subsequently, we fine-tune the MLLM to integrate vision, language, and compressed actions into a unified semantic space. Experimental results demonstrate that QUART-Online operates in tandem with the existing MLLM system, achieving real-time inference in sync with the underlying controller frequency, significantly boosting the success rate across various tasks by 65%. Our project page is https://quart-online.github.io.
comment: Accepted to ICRA 2025; Github page: https://quart-online.github.io
Map Space Belief Prediction for Manipulation-Enhanced Mapping
Searching for objects in cluttered environments requires selecting efficient viewpoints and manipulation actions to remove occlusions and reduce uncertainty in object locations, shapes, and categories. In this work, we address the problem of manipulation-enhanced semantic mapping, where a robot has to efficiently identify all objects in a cluttered shelf. Although Partially Observable Markov Decision Processes~(POMDPs) are standard for decision-making under uncertainty, representing unstructured interactive worlds remains challenging in this formalism. To tackle this, we define a POMDP whose belief is summarized by a metric-semantic grid map and propose a novel framework that uses neural networks to perform map-space belief updates to reason efficiently and simultaneously about object geometries, locations, categories, occlusions, and manipulation physics. Further, to enable accurate information gain analysis, the learned belief updates should maintain calibrated estimates of uncertainty. Therefore, we propose Calibrated Neural-Accelerated Belief Updates (CNABUs) to learn a belief propagation model that generalizes to novel scenarios and provides confidence-calibrated predictions for unknown areas. Our experiments show that our novel POMDP planner improves map completeness and accuracy over existing methods in challenging simulations and successfully transfers to real-world cluttered shelves in zero-shot fashion.
comment: 14 pages, 10 figures; To appear at RSS 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Achieving generalization in robotic manipulation remains a critical challenge, particularly for unseen scenarios and novel tasks. Current Vision-Language-Action (VLA) models, while building on top of general Vision-Language Models (VLMs), still fall short of achieving robust zero-shot performance due to the scarcity and heterogeneity prevalent in embodied datasets. To address these limitations, we propose FSD (From Seeing to Doing), a novel vision-language model that generates intermediate representations through spatial relationship reasoning, providing fine-grained guidance for robotic manipulation. Our approach combines a hierarchical data pipeline for training with a self-consistency mechanism that aligns spatial coordinates with visual signals. Through extensive experiments, we comprehensively validated FSD's capabilities in both "seeing" and "doing," achieving outstanding performance across 8 benchmarks for general spatial reasoning and embodied reference abilities, as well as on our proposed more challenging benchmark VABench. We also verified zero-shot capabilities in robot manipulation, demonstrating significant performance improvements over baseline methods in both SimplerEnv and real robot settings. Experimental results show that FSD achieves 40.6% success rate in SimplerEnv and 72% success rate across 8 real-world tasks, outperforming the strongest baseline by 30%.
comment: Our project homepage: https://embodied-fsd.github.io/
Pointing the Way: Refining Radar-Lidar Localization Using Learned ICP Weights
This paper presents a novel deep-learning-based approach to improve localizing radar measurements against lidar maps. This radar-lidar localization leverages the benefits of both sensors; radar is resilient against adverse weather, while lidar produces high-quality maps in clear conditions. However, owing in part to the unique artefacts present in radar measurements, radar-lidar localization has struggled to achieve comparable performance to lidar-lidar systems, preventing it from being viable for autonomous driving. This work builds on ICP-based radar-lidar localization by including a learned preprocessing step that weights radar points based on high-level scan information. To train the weight-generating network, we present a novel, stand-alone, open-source differentiable ICP library. The learned weights facilitate ICP by filtering out harmful radar points related to artefacts, noise, and even vehicles on the road. Combining an analytical approach with a learned weight reduces overall localization errors and improves convergence in radar-lidar ICP results run on real-world autonomous driving data. Our code base is publicly available to facilitate reproducibility and extensions.
comment: 8 pages, 4 figures, 1 table. Accepted to 22nd Conference on Robots and Vision (CRV) 2025
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes
Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot RL (MRRL) policies with realistic robot dynamics and safety constraints, supporting parallelization and hardware acceleration. Our generalizable learning interface integrates easily with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation. Our code is available at https://github.com/GT-STAR-Lab/JaxRobotarium.
comment: 22 pages, 14 figures, 10 tables. https://github.com/GT-STAR-Lab/JaxRobotarium
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, and linguistic task descriptions. To ensure data consistency and reliability for imitation learning, RoboMIND is built on a unified data collection platform and a standardized protocol, covering four distinct robotic embodiments: the Franka Emika Panda, the UR5e, the AgileX dual-arm robot, and a humanoid robot with dual dexterous hands. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction during policy learning. Additionally, we created a digital twin environment in the Isaac Sim simulator, replicating the real-world tasks and assets, which facilitates the low-cost collection of additional training data and enables efficient evaluation. To demonstrate the quality and diversity of our dataset, we conducted extensive experiments using various imitation learning methods for single-task settings and state-of-the-art Vision-Language-Action (VLA) models for multi-task scenarios. By leveraging RoboMIND, the VLA models achieved high manipulation success rates and demonstrated strong generalization capabilities. To the best of our knowledge, RoboMIND is the largest multi-embodiment teleoperation dataset collected on a unified platform, providing large-scale and high-quality robotic training data. Our project is at https://x-humanoid-robomind.github.io/.
comment: 21 pages, 17 figures, Robotics: Science and Systems 2025
Chance-Constrained Sampling-Based MPC for Collision Avoidance in Uncertain Dynamic Environments
Navigating safely in dynamic and uncertain environments is challenging due to uncertainties in perception and motion. This letter presents the Chance-Constrained Unscented Model Predictive Path Integral (C2U-MPPI) framework, a robust sampling-based Model Predictive Control (MPC) algorithm that addresses these challenges by leveraging the U-MPPI control strategy with integrated probabilistic chance constraints, enabling more reliable and efficient navigation under uncertainty. Unlike gradient-based MPC methods, our approach (i) avoids linearization of system dynamics by directly applying non-convex and nonlinear chance constraints, enabling more accurate and flexible optimization, and (ii) enhances computational efficiency by leveraging a deterministic form of probabilistic constraints and employing a layered dynamic obstacle representation, enabling real-time handling of multiple obstacles. Extensive experiments in simulated and real-world human-shared environments validate the effectiveness of our algorithm against baseline methods, showcasing its capability to generate feasible trajectories and control inputs that adhere to system dynamics and constraints in dynamic settings, enabled by unscented-based sampling strategy and risk-sensitive trajectory evaluation. A supplementary video is available at: https://youtu.be/FptAhvJlQm8.
comment: This paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L), May 2025. It comprises 9 pages, 3 figures, and 7 tables
Event-based Stereo Depth Estimation: A Survey
Stereopsis has widespread appeal in robotics as it is the predominant way by which living beings perceive depth to navigate our 3D world. Event cameras are novel bio-inspired sensors that detect per-pixel brightness changes asynchronously, with very high temporal resolution and high dynamic range, enabling machine perception in high-speed motion and broad illumination conditions. The high temporal precision also benefits stereo matching, making disparity (depth) estimation a popular research area for event cameras ever since its inception. Over the last 30 years, the field has evolved rapidly, from low-latency, low-power circuit design to current deep learning (DL) approaches driven by the computer vision community. The bibliography is vast and difficult to navigate for non-experts due its highly interdisciplinary nature. Past surveys have addressed distinct aspects of this topic, in the context of applications, or focusing only on a specific class of techniques, but have overlooked stereo datasets. This survey provides a comprehensive overview, covering both instantaneous stereo and long-term methods suitable for simultaneous localization and mapping (SLAM), along with theoretical and empirical comparisons. It is the first to extensively review DL methods as well as stereo datasets, even providing practical suggestions for creating new benchmarks to advance the field. The main advantages and challenges faced by event-based stereo depth estimation are also discussed. Despite significant progress, challenges remain in achieving optimal performance in not only accuracy but also efficiency, a cornerstone of event-based computing. We identify several gaps and propose future research directions. We hope this survey inspires future research in this area, by serving as an accessible entry point for newcomers, as well as a practical guide for seasoned researchers in the community.
comment: 28 pages, 24 figures, 7 tables
Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks
We provide a communication- and computation-efficient method for distributed submodular optimization in robot mesh networks. Submodularity is a property of diminishing returns that arises in active information gathering such as mapping, surveillance, and target tracking. Our method, Resource-Aware distributed Greedy (RAG), introduces a new distributed optimization paradigm that enables scalable and near-optimal action coordination. To this end, RAG requires each robot to make decisions based only on information received from and about their neighbors. In contrast, the current paradigms allow the relay of information about all robots across the network. As a result, RAG's decision-time scales linearly with the network size, while state-of-the-art near-optimal submodular optimization algorithms scale cubically. We also characterize how the designed mesh-network topology affects RAG's approximation performance. Our analysis implies that sparser networks favor scalability without proportionally compromising approximation performance: while RAG's decision time scales linearly with network size, the gain in approximation performance scales sublinearly. We demonstrate RAG's performance in simulated scenarios of area detection with up to 45 robots, simulating realistic robot-to-robot (r2r) communication speeds such as the 0.25 Mbps speed of the Digi XBee 3 Zigbee 3.0. In the simulations, RAG enables real-time planning, up to three orders of magnitude faster than competitive near-optimal algorithms, while also achieving superior mean coverage performance. To enable the simulations, we extend the high-fidelity and photo-realistic simulator AirSim by integrating a scalable collaborative autonomy pipeline to tens of robots and simulating r2r communication delays. Our code is available at https://github.com/UM-iRaL/Resource-Aware-Coordination-AirSim.
comment: Accepted to IEEE Transactions on Robotics
Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations
Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.
Multiagent Systems
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster's ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visual-in-the-loop multi-agent pipeline: the (a)Parser distills the paper into a structured asset library; the (b)Planner aligns text-visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c)Painter-Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs-though visually appealing at first glance-often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source variants (e.g. based on the Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across nearly all metrics, while using 87% fewer tokens. It transforms a 22-page paper into a finalized yet editable .pptx poster - all for just $0.005. These findings chart clear directions for the next generation of fully automated poster-generation models. The code and datasets are available at https://github.com/Paper2Poster/Paper2Poster.
comment: Project Page: https://github.com/Paper2Poster/Paper2Poster
Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks
Agent-Based Models (ABMs) are powerful tools for studying emergent properties in complex systems. In ABMs, agent behaviors are governed by local interactions and stochastic rules. However, these rules are, in general, non-differentiable, limiting the use of gradient-based methods for optimization, and thus integration with real-world data. We propose a novel framework to learn a differentiable surrogate of any ABM by observing its generated data. Our method combines diffusion models to capture behavioral stochasticity and graph neural networks to model agent interactions. Distinct from prior surrogate approaches, our method introduces a fundamental shift: rather than approximating system-level outputs, it models individual agent behavior directly, preserving the decentralized, bottom-up dynamics that define ABMs. We validate our approach on two ABMs (Schelling's segregation model and a Predator-Prey ecosystem) showing that it replicates individual-level patterns and accurately forecasts emergent dynamics beyond training. Our results demonstrate the potential of combining diffusion models and graph learning for data-driven ABM simulation.
Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery
Focused Ultrasound Ablation Surgery (FUAS) has emerged as a promising non-invasive therapeutic modality, valued for its safety and precision. Nevertheless, its clinical implementation entails intricate tasks such as multimodal image interpretation, personalized dose planning, and real-time intraoperative decision-making processes that demand intelligent assistance to improve efficiency and reliability. We introduce FUAS-Agents, an autonomous agent system that leverages the multimodal understanding and tool-using capabilities of large language models (LLMs). By integrating patient profiles and MRI data, FUAS-Agents orchestrates a suite of specialized medical AI tools, including segmentation, treatment dose prediction, and clinical guideline retrieval, to generate personalized treatment plans comprising MRI image, dose parameters, and therapeutic strategies. We evaluate the system in a uterine fibroid treatment scenario. Human assessment by four senior FUAS experts indicates that 82.5%, 82.5%, 87.5%, and 97.5% of the generated plans were rated 4 or above (on a 5-point scale) in terms of completeness, accuracy, fluency, and clinical compliance, respectively. These results demonstrate the potential of LLM-driven agents in enhancing decision-making across complex clinical workflows, and exemplify a translational paradigm that combines general-purpose models with specialized expert systems to solve practical challenges in vertical healthcare domains.
Large Language Models Miss the Multi-Agent Mark
Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.
Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date. Our experimental data and code are available at https://sites.google.com/view/inf-marl.
GGBond: Growing Graph-Based AI-Agent Society for Socially-Aware Recommender Simulation
Current personalized recommender systems predominantly rely on static offline data for algorithm design and evaluation, significantly limiting their ability to capture long-term user preference evolution and social influence dynamics in real-world scenarios. To address this fundamental challenge, we propose a high-fidelity social simulation platform integrating human-like cognitive agents and dynamic social interactions to realistically simulate user behavior evolution under recommendation interventions. Specifically, the system comprises a population of Sim-User Agents, each equipped with a five-layer cognitive architecture that encapsulates key psychological mechanisms, including episodic memory, affective state transitions, adaptive preference learning, and dynamic trust-risk assessments. In particular, we innovatively introduce the Intimacy--Curiosity--Reciprocity--Risk (ICR2) motivational engine grounded in psychological and sociological theories, enabling more realistic user decision-making processes. Furthermore, we construct a multilayer heterogeneous social graph (GGBond Graph) supporting dynamic relational evolution, effectively modeling users' evolving social ties and trust dynamics based on interest similarity, personality alignment, and structural homophily. During system operation, agents autonomously respond to recommendations generated by typical recommender algorithms (e.g., Matrix Factorization, MultVAE, LightGCN), deciding whether to consume, rate, and share content while dynamically updating their internal states and social connections, thereby forming a stable, multi-round feedback loop. This innovative design transcends the limitations of traditional static datasets, providing a controlled, observable environment for evaluating long-term recommender effects.
Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety Games
We consider two-player zero-sum concurrent stochastic games (CSGs) played on graphs with reachability and safety objectives. These include degenerate classes such as Markov decision processes or turn-based stochastic games, which can be solved by linear or quadratic programming; however, in practice, value iteration (VI) outperforms the other approaches and is the most implemented method. Similarly, for CSGs, this practical performance makes VI an attractive alternative to the standard theoretical solution via the existential theory of reals. VI starts with an under-approximation of the sought values for each state and iteratively updates them, traditionally terminating once two consecutive approximations are $\epsilon$-close. However, this stopping criterion lacks guarantees on the precision of the approximation, which is the goal of this work. We provide bounded (a.k.a. interval) VI for CSGs: it complements standard VI with a converging sequence of over-approximations and terminates once the over- and under-approximations are $\epsilon$-close.
comment: Full version of the corresponding LICS'25 paper
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents' actions in a multi-agent system aligns with the reverse process in diffusion models--a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research.
Generalized Coordination of Partially Cooperative Urban Traffic
Vehicle-to-anything connectivity, especially for autonomous vehicles, promises to increase passenger comfort and safety of road traffic, for example, by sharing perception and driving intention. Cooperative maneuver planning uses connectivity to enhance traffic efficiency, which has, so far, been mainly considered for automated intersection management. In this article, we present a novel cooperative maneuver planning approach that is generalized to various situations found in urban traffic. Our framework handles challenging mixed traffic, that is, traffic comprising both cooperative connected vehicles and other vehicles at any distribution. Our solution is based on an optimization approach accompanied by an efficient heuristic method for high-load scenarios. We extensively evaluate the proposed planer in a distinctly realistic simulation framework and show significant efficiency gains already at a cooperation rate of 40%. Traffic throughput increases, while the average waiting time and the number of stopped vehicles are reduced, without impacting traffic safety.
comment: 8 pages, 7 figures, 3 tables, submitted to IEEE Transactions on Intelligent Vehicles (T-IV)
MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems
As large language models (LLMs) are increasingly deployed in healthcare, ensuring their safety, particularly within collaborative multi-agent configurations, is paramount. In this paper we introduce MedSentry, a benchmark comprising 5 000 adversarial medical prompts spanning 25 threat categories with 100 subthemes. Coupled with this dataset, we develop an end-to-end attack-defense evaluation pipeline to systematically analyze how four representative multi-agent topologies (Layers, SharedPool, Centralized, and Decentralized) withstand attacks from 'dark-personality' agents. Our findings reveal critical differences in how these architectures handle information contamination and maintain robust decision-making, exposing their underlying vulnerability mechanisms. For instance, SharedPool's open information sharing makes it highly susceptible, whereas Decentralized architectures exhibit greater resilience thanks to inherent redundancy and isolation. To mitigate these risks, we propose a personality-scale detection and correction mechanism that identifies and rehabilitates malicious agents, restoring system safety to near-baseline levels. MedSentry thus furnishes both a rigorous evaluation framework and practical defense strategies that guide the design of safer LLM-based multi-agent systems in medical domains.
AI-Supported Platform for System Monitoring and Decision-Making in Nuclear Waste Management with Large Language Models
Nuclear waste management requires rigorous regulatory compliance assessment, demanding advanced decision-support systems capable of addressing complex legal, environmental, and safety considerations. This paper presents a multi-agent Retrieval-Augmented Generation (RAG) system that integrates large language models (LLMs) with document retrieval mechanisms to enhance decision accuracy through structured agent collaboration. Through a structured 10-round discussion model, agents collaborate to assess regulatory compliance and safety requirements while maintaining document-grounded responses. Implemented on consumer-grade hardware, the system leverages Llama 3.2 and mxbai-embed-large-v1 embeddings for efficient retrieval and semantic representation. A case study of a proposed temporary nuclear waste storage site near Winslow, Arizona, demonstrates the framework's effectiveness. Results show the Regulatory Agent achieves consistently higher relevance scores in maintaining alignment with legal frameworks, while the Safety Agent effectively manages complex risk assessments requiring multifaceted analysis. The system demonstrates progressive improvement in agreement rates between agents across discussion rounds while semantic drift decreases, indicating enhanced decision-making consistency and response coherence. The system ensures regulatory decisions remain factually grounded, dynamically adapting to evolving regulatory frameworks through real-time document retrieval. By balancing automated assessment with human oversight, this framework offers a scalable and transparent approach to regulatory governance. These findings underscore the potential of AI-driven, multi-agent systems in advancing evidence-based, accountable, and adaptive decision-making for high-stakes environmental management scenarios.
Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems
Recent advancements in Large Language Models (LLMs) have enabled the emergence of multi-agent systems where LLMs interact, collaborate, and make decisions in shared environments. While individual model behavior has been extensively studied, the dynamics of peer influence in such systems remain underexplored. In this paper, we investigate herd behavior, the tendency of agents to align their outputs with those of their peers, within LLM-based multi-agent interactions. We present a series of controlled experiments that reveal how herd behaviors are shaped by multiple factors. First, we show that the gap between self-confidence and perceived confidence in peers significantly impacts an agent's likelihood to conform. Second, we find that the format in which peer information is presented plays a critical role in modulating the strength of herd behavior. Finally, we demonstrate that the degree of herd behavior can be systematically controlled, and that appropriately calibrated herd tendencies can enhance collaborative outcomes. These findings offer new insights into the social dynamics of LLM-based systems and open pathways for designing more effective and adaptive multi-agent collaboration frameworks.
comment: Preprint
Improving flocking behaviors in street networks with vision
We improve a flocking model on street networks introduced in a previous paper. We expand the field of vision of walkers, making the model more realistic. Under such conditions, we obtain groups of walkers whose gathering times and robustness to break ups are better than previous results. We explain such improvements because the alignment rule with vision guaranties walkers do not split into divergent directions at intersections anymore, and because the attraction rule with vision gathers distant groups. This paves the way to a better understanding of events where walkers have collective decentralized goals, like protests.
Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System ACL 2025
The rapid advancement of scientific progress requires innovative tools that can accelerate knowledge discovery. Although recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short of replicating the collaborative nature of real-world scientific practices, where diverse experts work together in teams to tackle complex problems. To address the limitations, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas. We further investigate the collaboration mechanisms that contribute to its tendency to produce ideas with higher novelty, offering valuable insights to guide future research and illuminating pathways toward building a robust system for autonomous scientific discovery. The code is available at https://github.com/open-sciencelab/Virtual-Scientists.
comment: Accepted by ACL 2025 Main Conference
Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge
Large Language Models (LLMs) have significantly advanced medical question-answering by leveraging extensive clinical data and medical literature. However, the rapid evolution of medical knowledge and the labor-intensive process of manually updating domain-specific resources pose challenges to the reliability of these systems. To address this, we introduce Agentic Medical Graph-RAG (AMG-RAG), a comprehensive framework that automates the construction and continuous updating of medical knowledge graphs, integrates reasoning, and retrieves current external evidence, such as PubMed and WikiSearch. By dynamically linking new findings and complex medical concepts, AMG-RAG not only improves accuracy but also enhances interpretability in medical queries. Evaluations on the MEDQA and MEDMCQA benchmarks demonstrate the effectiveness of AMG-RAG, achieving an F1 score of 74.1 percent on MEDQA and an accuracy of 66.34 percent on MEDMCQA, outperforming both comparable models and those 10 to 100 times larger. Notably, these improvements are achieved without increasing computational overhead, highlighting the critical role of automated knowledge graph generation and external evidence retrieval in delivering up-to-date, trustworthy medical insights.
Sequential Resource Trading Using Comparison-Based Gradient Estimation
Autonomous agents interact with other autonomous agents and humans of unknown preferences to share resources in their environment. We explore sequential trading for resource allocation in a setting where two greedily rational agents sequentially trade resources from a finite set of categories. Each agent has a utility function that depends on the amount of resources it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent's utility function, and the responding agent only accepts offers that improve its utility. To facilitate cooperation between an autonomous agent and another autonomous agent or a human, we present an algorithm for the offering agent to estimate the responding agent's gradient (preferences) and make offers based on previous acceptance or rejection responses. The algorithm's goal is to reach a Pareto-optimal resource allocation state while ensuring that the utilities of both agents improve after every accepted trade. The algorithm estimates the responding agent's gradient by leveraging the rejected offers and the greedy rationality assumption, to prune the space of potential gradients. We show that, after the algorithm makes a finite number of rejected offers, the algorithm either finds a mutually beneficial trade or certifies that the current state is epsilon-weakly Pareto optimal. We compare the proposed algorithm against various baselines in continuous and discrete trading scenarios and show that it improves the societal benefit with fewer offers. Additionally, we validate these findings in a user study with human participants, where the algorithm achieves high performance in scenarios with high resource conflict due to aligned agent goals.
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks across applications such as robotics and traffic management. Despite their growing importance, safety in multi-agent systems remains largely underexplored, with most research focusing on single AI models rather than interacting agents. This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions. By leveraging reasoning abilities, each agent evaluates responses from others to detect illogical reasoning processes, which indicate poisoned agents. Experiments on LLM-based multi-agent systems, including ChatGPT series and Llama 3, demonstrate the effectiveness of the proposed method, achieving high accuracy in identifying poisoned agents while minimizing false positives on clean agents. We believe this work provides insights into multi-agent system safety and contributes to the development of robust, trustworthy AI interactions.
comment: This paper has been accepted to IEEE IRI 2025
Voting or Consensus? Decision-Making in Multi-Agent Debate
Much of the success of multi-agent debates depends on carefully choosing the right parameters. The decision-making protocol stands out as it can highly impact final model answers, depending on how decisions are reached. Systematic comparison of decision protocols is difficult because many studies alter multiple discussion parameters beyond the protocol. So far, it has been largely unknown how decision-making influences different tasks. This work systematically evaluates the impact of seven decision protocols (e.g., majority voting, unanimity consensus). We change only one variable at a time - the decision protocol - to analyze how different methods affect the collaboration between agents and measure differences in knowledge and reasoning tasks. Our results show that voting protocols improve performance by 13.2% in reasoning tasks and consensus protocols by 2.8% in knowledge tasks compared to other decision protocols. Increasing the number of agents improves performance, while more discussion rounds before voting reduce it. To improve decision-making by increasing answer diversity, we propose two new methods, All-Agents Drafting (AAD) and Collective Improvement (CI). Our methods improve task performance by up to 3.3% with AAD and up to 7.4% with CI. This work demonstrates the importance of decision-making in multi-agent debates beyond scaling.
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Additionally, we further extend ReMA to multi-turn interaction settings, leveraging turn-level ratio and parameter sharing to improve efficiency. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs. Our code can be found in https://github.com/ziyuwan/ReMA-public
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes
Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot RL (MRRL) policies with realistic robot dynamics and safety constraints, supporting parallelization and hardware acceleration. Our generalizable learning interface integrates easily with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation. Our code is available at https://github.com/GT-STAR-Lab/JaxRobotarium.
comment: 22 pages, 14 figures, 10 tables. https://github.com/GT-STAR-Lab/JaxRobotarium
Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation
This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of observer error, it is difficult to analyze the convergence and stability of most existing output feedback learning-based control methods. To tackle these issues, we propose a generalized dynamic output feedback learning control approach with guaranteed convergence, stability, and optimality performance for solving the LQR problem of unknown discrete-time linear systems. In particular, a dynamic output feedback controller is designed to be equivalent to a state feedback controller. This equivalence relationship is an inherent property without requiring convergence of the estimated state by the state observer, which plays a key role in establishing the off-policy learning control approaches. By value iteration and policy iteration schemes, the adaptive dynamic programming based learning control approaches are developed to estimate the optimal feedback control gain. In addition, a model-free stability criterion is provided by finding a nonsingular parameterization matrix, which contributes to establishing a switched iteration scheme. Furthermore, the convergence, stability, and optimality analyses of the proposed output feedback learning control approaches are given. Finally, the theoretical results are validated by two numerical examples.
comment: 16 pages, 5 figures
Empowering Scientific Workflows with Federated Agents
Agentic systems, in which diverse agents cooperate to tackle challenging problems, are exploding in popularity in the AI community. However, the agentic frameworks used to build these systems have not previously enabled use with research cyberinfrastructure. Here we introduce Academy, a modular and extensible middleware designed to deploy autonomous agents across the federated research ecosystem, including HPC systems, experimental facilities, and data repositories. To meet the demands of scientific computing, Academy supports asynchronous execution, heterogeneous resources, high-throughput data flows, and dynamic resource availability. It provides abstractions for expressing stateful agents, managing inter-agent coordination, and integrating computation with experimental control. We present microbenchmark results that demonstrate high performance and scalability in HPC environments. To demonstrate the breadth of applications that can be supported by agentic workflow designs, we also present case studies in materials discovery, decentralized learning, and information extraction in which agents are deployed across diverse HPC systems.
Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks
We provide a communication- and computation-efficient method for distributed submodular optimization in robot mesh networks. Submodularity is a property of diminishing returns that arises in active information gathering such as mapping, surveillance, and target tracking. Our method, Resource-Aware distributed Greedy (RAG), introduces a new distributed optimization paradigm that enables scalable and near-optimal action coordination. To this end, RAG requires each robot to make decisions based only on information received from and about their neighbors. In contrast, the current paradigms allow the relay of information about all robots across the network. As a result, RAG's decision-time scales linearly with the network size, while state-of-the-art near-optimal submodular optimization algorithms scale cubically. We also characterize how the designed mesh-network topology affects RAG's approximation performance. Our analysis implies that sparser networks favor scalability without proportionally compromising approximation performance: while RAG's decision time scales linearly with network size, the gain in approximation performance scales sublinearly. We demonstrate RAG's performance in simulated scenarios of area detection with up to 45 robots, simulating realistic robot-to-robot (r2r) communication speeds such as the 0.25 Mbps speed of the Digi XBee 3 Zigbee 3.0. In the simulations, RAG enables real-time planning, up to three orders of magnitude faster than competitive near-optimal algorithms, while also achieving superior mean coverage performance. To enable the simulations, we extend the high-fidelity and photo-realistic simulator AirSim by integrating a scalable collaborative autonomy pipeline to tens of robots and simulating r2r communication delays. Our code is available at https://github.com/UM-iRaL/Resource-Aware-Coordination-AirSim.
comment: Accepted to IEEE Transactions on Robotics
Systems and Control (CS)
Quasi Steady-State Frequency
Accurate frequency estimation is critical for the control, monitoring and protection of electrical power systems, in particular, of systems with a high penetration of power electronics. This paper introduces the novel concept of Quasi Steady-State (QSS) frequency as a quantity that fills the gap between stationary and instantaneous frequency. QSS frequency coincides with the fundamental frequency of an AC voltage in any stationary conditions, including unbalanced and non-sinusoidal, and is able to capture the time-varying fundamental frequency in transient conditions. The paper also proposes a metric borrowed from fluid dynamics, namely, the time derivative of the circulation, to define the scope of validity of the QSS frequency. Analytical examples as well as a case study based on a fully-fledged EMT model of the IEEE 39-bus system serve to illustrate, respectively, the properties of the QSS frequency and its behavior in transient conditions.
WiCAL: Accurate Wi-Fi-Based 3D Localization Enabled by Collaborative Antenna Arrays
Accurate 3D localization is essential for realizing advanced sensing functionalities in next-generation Wi-Fi communication systems. This study investigates the potential of multistatic localization in Wi-Fi networks through the deployment of multiple cooperative antenna arrays. The collaborative gain offered by these arrays is twofold: (i) intra-array coherent gain at the wavelength scale among antenna elements, and (ii) inter-array cooperative gain across arrays. To evaluate the feasibility and performance of this approach, we develop WiCAL (Wi-Fi Collaborative Antenna Localization), a system built upon commercial Wi-Fi infrastructure equipped with uniform rectangular arrays. These arrays are driven by multiplexing embedded radio frequency chains available in standard access points or user devices, thereby eliminating the need for sophisticated, costly, and power-hungry multi-transceiver modules typically required in multiple-input and multiple-output systems. To address phase offsets introduced by RF chain multiplexing, we propose a three-stage, fine-grained phase alignment scheme to synchronize signals across antenna elements within each array. A bidirectional spatial smoothing MUSIC algorithm is employed to estimate angles of arrival (AoAs) and mitigate performance degradation caused by correlated interference. To further exploit inter-array cooperative gain, we elaborate on the synchronization mechanism among distributed URAs, which enables direct position determination by bypassing intermediate angle estimation. Once synchronized, the distributed URAs effectively form a virtual large-scale array, significantly enhancing spatial resolution and localization accuracy.
comment: 14 page, 22 figures
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
We extend the generalized global Hartman-Grobman theorem by Kvalheim and Sontag for flows to a case of asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277, 3 pages, comments are welcome
Distributed equilibrium seeking in aggregative games: linear convergence under singular perturbations lens
We present a fully-distributed algorithm for Nash equilibrium seeking in aggregative games over networks. The proposed scheme endows each agent with a gradient-based scheme equipped with a tracking mechanism to locally reconstruct the aggregative variable, which is not available to the agents. We show that our method falls into the framework of singularly perturbed systems, as it involves the interconnection between a fast subsystem - the global information reconstruction dynamics - with a slow one concerning the optimization of the local strategies. This perspective plays a key role in analyzing the scheme with a constant stepsize, and in proving its linear convergence to the Nash equilibrium in strongly monotone games with local constraints. By exploiting the flexibility of our aggregative variable definition (not necessarily the arithmetic average of the agents' strategy), we show the efficacy of our algorithm on a realistic voltage support case study for the smart grid.
comment: Presented at the 2024 IEEE 63rd Conference on Decision and Control (CDC), Milan, Italy. Accepted manuscript version, 7 pages. arXiv admin note: text overlap with arXiv:2210.14547
Stochastic Geometry-Based Performance Evaluation for LEO Satellite-Assisted Space Caching
To achieve the Internet of Things (IoT) vision,Mobile Edge Computing (MEC) is a promising technology aimed at providing low-latency computing services to user equipment (UE). However, terrestrial MEC network struggles to provide service to UEs in remote and maritime region. Low Earth Orbit (LEO) satellite networks have the potential to overcome geographical restrictions and provide seamless global coverage for UEs. In this paper, we provide the first attempt to use stochastic geometry to investigate the performance of implementing space caching with LEO satellites (SATs) in the MEC network. We study a LEO satellite-assisted space caching MEC network, and LEO SATs can be equipped with servers to enable space caching, with the advantage of seamless coverage to assist terrestrial CSs for serving UEs in remote or maritime reigon. Using stochastic geometry and queuing theory, we establish an analytical framework for this MEC network. Meanwhile, we develop association strategies for UEs to connect with LEO SATs or CSs and utilize stochastic geometry to derive uplink and downlink coverage probabilities, considering the diversity of task and service types. On this basis, we employ the queuing theory to calculate the average delay to evaluate the system performance. Through Monte Carlo simulations and numerical results, the system performance is evaluated. The results show the potential of SAT spatial caching in improving the performance of the MEC network. Additionally, our results reveal useful insights such as the significant impact of the altitude and number of LEO SATs on the average delay of the network, providing helpful system-level recommendations for the design and configuration of the space-caching MEC network.
comment: 15 pages, 12 figures, be accepted by IEEE IoTJ
Active Learning-Enhanced Dual Control for Angle-Only Initial Relative Orbit Determination
Accurate relative orbit determination is a key challenge in modern space operations, particularly when relying on angle-only measurements. The inherent observability limitations of this approach make initial state estimation difficult, impacting mission safety and performance. This work explores the use of active learning (AL) techniques to enhance observability by dynamically designing the input excitation signal offline and at runtime. Our approach leverages AL to design the input signal dynamically, enhancing the observability of the system without requiring additional hardware or predefined maneuvers. We incorporate a dual control technique to ensure target tracking while maintaining observability. The proposed method is validated through numerical simulations, demonstrating its effectiveness in estimating the initial relative state of the chaser and target spacecrafts and its robustness to various initial relative distances and observation periods.
Output Regulation of Linear Systems with Non-periodic Non-smooth Exogenous Signals
We address the output regulation problem of linear systems with non-smooth and non-periodic exogenous signals. Specifically, we first formulate and solve the full-information problem by designing a state-feedback controller. We study the solvability of the regulator equations, providing a new non-resonance condition. We then focus on the error-feedback problem, for which we design a (non-robust) internal model leveraging the concept of canonical realisation and applying a high-gain method for the stabilisation of the closed-loop system under the minimum-phase assumption. Finally, we study the regulation problem involving model parameter uncertainties. Drawing ideas from both hybrid and time-varying (smooth) output regulation, we propose two methods to establish an internal model that is robust to uncertainties. The first method is an extension of the hybrid internal model, while the second relies on a new concept of immersion. In this non-smooth case, the immersion is established based on integrals rather than derivatives. The effectiveness of the proposed solutions is illustrated by a circuit regulation example.
comment: 18 pages, 6 figures
A domain adaptation neural network for digital twin-supported fault diagnosis
Digital twins offer a promising solution to the lack of sufficient labeled data in deep learning-based fault diagnosis by generating simulated data for model training. However, discrepancies between simulation and real-world systems can lead to a significant drop in performance when models are applied in real scenarios. To address this issue, we propose a fault diagnosis framework based on Domain-Adversarial Neural Networks (DANN), which enables knowledge transfer from simulated (source domain) to real-world (target domain) data. We evaluate the proposed framework using a publicly available robotics fault diagnosis dataset, which includes 3,600 sequences generated by a digital twin model and 90 real sequences collected from physical systems. The DANN method is compared with commonly used lightweight deep learning models such as CNN, TCN, Transformer, and LSTM. Experimental results show that incorporating domain adaptation significantly improves the diagnostic performance. For example, applying DANN to a baseline CNN model improves its accuracy from 70.00% to 80.22% on real-world test data, demonstrating the effectiveness of domain adaptation in bridging the sim-to-real gap.
comment: Preprint accepted by ICCAD 2025 at Barcelona
Multi-Mode Process Control Using Multi-Task Inverse Reinforcement Learning
In the era of Industry 4.0 and smart manufacturing, process systems engineering must adapt to digital transformation. While reinforcement learning offers a model-free approach to process control, its applications are limited by the dependence on accurate digital twins and well-designed reward functions. To address these limitations, this paper introduces a novel framework that integrates inverse reinforcement learning (IRL) with multi-task learning for data-driven, multi-mode control design. Using historical closed-loop data as expert demonstrations, IRL extracts optimal reward functions and control policies. A latent-context variable is incorporated to distinguish modes, enabling the training of mode-specific controllers. Case studies on a continuous stirred tank reactor and a fed-batch bioreactor validate the effectiveness of this framework in handling multi-mode data and training adaptable controllers.
Efficient Spectral Control of Partially Observed Linear Dynamical Systems
We propose a new method for the problem of controlling linear dynamical systems under partial observation and adversarial disturbances. Our new algorithm, Double Spectral Control (DSC), matches the best known regret guarantees while exponentially improving runtime complexity over previous approaches in its dependence on the system's stability margin. Our key innovation is a two-level spectral approximation strategy, leveraging double convolution with a universal basis of spectral filters, enabling efficient and accurate learning of the best linear dynamical controllers.
Analysis of Joint Radar and Communication in Disaster Scenarios
With the increasing frequency and intensity of natural disasters, there is a necessity for advanced technologies that can provide reliable situational awareness and communication. Conventional systems are often inadequate due to unreliable infrastructure, power grid failures, high investment costs and scalability challenges. This paper explores the potential of ad-hoc mesh joint radar and communication (JRC) networks as a scalable, resilient, energy-efficient solution for disaster management that can operate independently of conventional infrastructure. The proposed JRC network enhances disaster response by integrating target detection (such as identifying vital signs, hazardous leaks, and fires) with communication capabilities to ensure efficient information dissemination under intense clutter conditions. Key performance metrics, including data rate, Signal-to-Clutter and Noise Ratio (SCNR), probability of detection, and false alarm rate, are used to assess performance. An optimization approach is proposed to provide an energy-efficient resource allocation scheme. The results show the performance of ad-hoc mesh JRC systems, underscoring their potential to enhance disaster management efforts by addressing unique operational challenges.
comment: 6 pages
Uncertainty Partitioning with Probabilistic Feasibility and Performance Guarantees for Chance-Constrained Optimization
We propose a novel distribution-free scheme to solve optimization problems where the goal is to minimize the expected value of a cost function subject to probabilistic constraints. Unlike standard sampling-based methods, our idea consists of partitioning the uncertainty domain in a user-defined number of sets, enabling more flexibility in the trade-off between conservatism and computational complexity. We provide sufficient conditions to ensure that our approximated problem is feasible for the original stochastic program, in terms of chance constraint satisfaction. In addition, we perform a rigorous performance analysis, by quantifying the distance between the optimal values of the original and the approximated problem. We show that our approach is tractable for optimization problems that include model predictive control of piecewise affine systems, and we demonstrate the benefits of our approach, in terms of the trade-off between conservatism and computational complexity, on a numerical example.
comment: Submitted to IEEE Transactions on Automatic Control
COM Adjustment Mechanism Control for Multi-Configuration Motion Stability of Unmanned Deformable Vehicle
An unmanned deformable vehicle is a wheel-legged robot transforming between two configurations: vehicular and humanoid states, with different motion modes and stability characteristics. To address motion stability in multiple configurations, a center-of-mass adjustment mechanism was designed. Further, a motion stability hierarchical control algorithm was proposed, and an electromechanical model based on a two-degree-of-freedom center-of-mass adjustment mechanism was established. An unmanned-deformable-vehicle vehicular-state steady-state steering dynamics model and a gait planning kinematic model of humanoid state walking were established. A stability hierarchical control strategy was designed to realize the stability control. The results showed that the steady-state steering stability in vehicular state and the walking stability in humanoid state could be significantly improved by controlling the slider motion.
Research on a Two-Layer Demand Response Framework for Electric Vehicle Users and Aggregators Based on LLMs
The widespread adoption of electric vehicles (EVs) has increased the importance of demand response in smart grids. This paper proposes a two-layer demand response optimization framework for EV users and aggregators, leveraging large language models (LLMs) to balance electricity supply and demand and optimize energy utilization during EV charging. The upper-layer model, focusing on the aggregator, aims to maximize profits by adjusting retail electricity prices. The lower-layer model targets EV users, using LLMs to simulate charging demands under varying electricity prices and optimize both costs and user comfort. The study employs a multi-threaded LLM decision generator to dynamically analyze user behavior, charging preferences, and psychological factors. The framework utilizes the PSO method to optimize electricity prices, ensuring user needs are met while increasing aggregator profits. Simulation results show that the proposed model improves EV charging efficiency, alleviates peak power loads, and stabilizes smart grid operations.
Effective Fixed-Time Control for Constrained Nonlinear System
In this paper, we tackle the state transformation problem in non-strict full state-constrained systems by introducing an adaptive fixed-time control method, utilizing a one-to-one asymmetric nonlinear mapping auxiliary system. Additionally, we develop a class of multi-threshold event-triggered control strategies that facilitate autonomous controller updates, substantially reducing communication resource consumption. Notably, the self-triggered strategy distinguishes itself from other strategies by obviating the need for continuous real-time monitoring of the controller's state variables. By accurately forecasting the subsequent activation instance, this strategy significantly optimizes the efficiency of the control system. Moreover, our theoretical analysis demonstrates that the semi-global practical fixed-time stability (SPFTS) criterion guarantees both tracking accuracy and closed-loop stability under state constraints, with convergence time independent of initial conditions. Finally, simulation results reveal that the proposed method significantly decreases the frequency of control command updates while maintaining tracking accuracy.
Collision-free Control Barrier Functions for General Ellipsoids via Separating Hyperplane
This paper presents a novel collision avoidance method for general ellipsoids based on control barrier functions (CBFs) and separating hyperplanes. First, collision-free conditions for general ellipsoids are analytically derived using the concept of dual cones. These conditions are incorporated into the CBF framework by extending the system dynamics of controlled objects with separating hyperplanes, enabling efficient and reliable collision avoidance. The validity of the proposed collision-free CBFs is rigorously proven, ensuring their effectiveness in enforcing safety constraints. The proposed method requires only single-level optimization, significantly reducing computational time compared to state-of-the-art methods. Numerical simulations and real-world experiments demonstrate the effectiveness and practicality of the proposed algorithm.
Physics-Informed Neural Network for Cross-Domain Predictive Control of Tapered Amplifier Thermal Stabilization
Thermally induced laser noise poses a critical limitation to the sensitivity of quantum sensor arrays employing ultra-stable amplified lasers, primarily stemming from nonlinear gain-temperature coupling effects in tapered amplifiers (TAs). To address this challenge, we present a robust intelligent control strategy that synergistically integrates an encoder-decoder physics-informed gated recurrent unit (PI-GRU) network with a model predictive control (MPC) framework. Our methodology incorporates physical soft constraints into the neural network architecture, yielding a predictive model with enhanced physical consistency that demonstrates robust extrapolation capabilities beyond the training data distribution. Leveraging the PI-GRU model's accurate multi-step predictive performance, we implement a hierarchical parallel MPC architecture capable of real-time thermal instability compensation. This hybrid approach achieves cross-domain consistent thermal stabilization in TAs under diverse laser power operations. Remarkably, while trained exclusively on low-power operational data, our system demonstrates exceptional generalization, improving prediction accuracy by 58.2% and temperature stability by 69.1% in previously unseen high-power operating regimes, as experimentally validated. The novel synchronization of physics-informed neural networks with advanced MPC frameworks presented in this work establishes a groundbreaking paradigm for addressing robustness challenges in cross-domain predictive control applications, overcoming conventional modeling limitations.
Data-Driven Existence and Design of Target Output Controllers
Target output controllers aim at regulating a system's target outputs by placing poles of a suitable subsystem using partial state feedback, where full state controllability is not required. This paper establishes existence conditions for such controllers using input and partial state data, where the system dynamics are unknown. The approach bypasses traditional system identification steps and leverages the intrinsic structure of historical data to certify controller existence and synthesize a suitable feedback gain. Analytical characterizations are provided, ensuring that the resulting closed-loop system satisfies desired performance objectives such as pole placement or stabilization. Data-driven algorithms are then proposed to design target output controllers directly from data without identifying system parameters, where controllers with the order matching the number of target outputs and with minimum-order augmented target outputs are both addressed. Furthermore, a separation principle is revealed, decoupling the design of target output controllers from state observers. This enables the development of data-driven observer-based controllers that integrate estimation and control. Numerical examples validate the theoretical results and demonstrate the efficacy of the proposed approach.
On Kernel Design for Regularized Volterra Series Identification of Wiener-Hammerstein Systems
There have been increasing interests on the Volterra series identification with the kernel-based regularization method. The major difficulties are on the kernel design and efficiency of the corresponding implementation. In this paper, we first assume that the underlying system to be identified is the Wiener-Hammerstein (WH) system with polynomial nonlinearity. We then show how to design kernels with nonzero off-diagonal blocks for Volterra maps by taking into account the prior knowledge of the linear blocks and the structure of WH systems. Moreover, exploring the structure of the designed kernels leads to the same computational complexity as the state-of-the-art result, i.e., $O(N^3)$, where $N$ is the sample size, but with a significant difference that the proposed kernels are designed in a direct and flexible way. In addition, for a special case of the kernel and a class of widely used input signals, further exploring the separable structure of the output kernel matrix can lower the computational complexity from $O(N^3)$ to $O(N\gamma^2)$, where $\gamma$ is the separability rank of the output kernel matrix and can be much smaller than $N$. We finally run Monte Carlo simulations to demonstrate the proposed kernels and the obtained theoretical results.
comment: 17 pages, 7 figures
Least Squares Model Reduction: A Two-Stage System-Theoretic Interpretation
Model reduction simplifies complex dynamical systems while preserving essential properties. This paper revisits a recently proposed system-theoretic framework for least squares moment matching. It interprets least squares model reduction in terms of two steps process: constructing a surrogate model to satisfy interpolation constraints, then projecting it onto a reduced-order space. Using tools from output regulation theory and Krylov projections, this approach provides a new view on classical methods. For illustration, we reexamine the least-squares model reduction method by Lucas and Smith, offering new insights into its structure.
comment: 13th IFAC Symposium on Nonlinear Control Systems. arXiv admin note: substantial text overlap with arXiv:2110.06072
Learning-Based Tracking Perimeter Control for Two-region Macroscopic Traffic Dynamics
Leveraging the concept of the macroscopic fundamental diagram (MFD), perimeter control can alleviate network-level congestion by identifying critical intersections and regulating them effectively. Considering the time-varying nature of travel demand and the equilibrium of accumulation state, we extend the conventional set-point perimeter control (SPC) problem for the two-region MFD system as an optimal tracking perimeter control problem (OTPCP). Unlike the SPC schemes that stabilize the traffic dynamics to the desired equilibrium point, the proposed tracking perimeter control (TPC) scheme regulates the traffic dynamics to a desired trajectory in a differential framework. Due to the inherent network uncertainties, such as heterogeneity of traffic dynamics and demand disturbance, the system dynamics could be uncertain or even unknown. To address these issues, we propose an adaptive dynamic programming (ADP) approach to solving the OTPCP without utilizing the well-calibrated system dynamics. Numerical experiments demonstrate the effectiveness of the proposed ADP-based TPC. Compared with the SPC scheme, the proposed TPC scheme achieves a 20.01% reduction in total travel time and a 3.15% improvement in cumulative trip completion. Moreover, the proposed adaptive TPC approach can regulate the accumulation state under network uncertainties and demand disturbances to the desired time-varying equilibrium trajectory that aims to maximize the trip completion under a nominal demand pattern. These results validate the robustness of the adaptive TPC approach.
Convergent Anthropocene Systems-of-Systems: Overcoming the Limitations of System Dynamics with Hetero-functional Graph Theory
Understanding the complexity and interdependence of systems in the Anthropocene is essential for making informed decisions about societal challenges spanning geophysical, biophysical, sociocultural, and sociotechnical domains. This paper explores the potential of Hetero-functional Graph Theory (HFGT) as a quantification tool for converting Model-based Systems Engineering (MBSE), stated in the Systems Modeling Language (SysML), into dynamic simulations-offering a comprehensive alternative to System Dynamics (SD) for representing interdependent systems of systems in the Anthropocene. The two approaches are compared in terms of systems thinking abstractions, methodological flexibility, and their ability to represent dynamic, multi-functional systems. Through a comparative study, the Mono Lake system is simulated in Northern California using both SD, and MBSE and HFGT, to highlight technical, conceptual and analytical differences. The simulations show equivalent results. However, MBSE and HFGT provide distinct advantages in capturing the nuances of the system through a broader set of systems thinking abstractions and in managing adaptive, multi-functional system interactions. These strengths position MBSE and HFGT as a powerful and flexible approach for representing, modeling, analyzing, and simulating heterogeneous and complex systems-of-systems in the Anthropocene.
Power-Capping Metric Evaluation for Improving Energy Efficiency SP
With high-performance computing systems now running at exascale, optimizing power-scaling management and resource utilization has become more critical than ever. This paper explores runtime power-capping optimizations that leverage integrated CPU-GPU power management on architectures like the NVIDIA GH200 superchip. We evaluate energy-performance metrics that account for simultaneous CPU and GPU power-capping effects by using two complementary approaches: speedup-energy-delay and a Euclidean distance-based multi-objective optimization method. By targeting a mostly compute-bound exascale science application, the Locally Self-Consistent Multiple Scattering (LSMS), we explore challenging scenarios to identify potential opportunities for energy savings in exascale applications, and we recognize that even modest reductions in energy consumption can have significant overall impacts. Our results highlight how GPU task-specific dynamic power-cap adjustments combined with integrated CPU-GPU power steering can improve the energy utilization of certain GPU tasks, thereby laying the groundwork for future adaptive optimization strategies.
comment: 14 pages, 3 figures, 2 tables. Accepted at the Energy Efficiency with Sustainable Performance: Techniques, Tools, and Best Practices, EESP Workshop, in conjunction with ISC High Performance 2025
Design and Analysis of a Grid-connected DC Fast Charging Station for Dhaka-Chittagong Highway
The growing adoption of electric vehicles (EVs) necessitates the development of efficient and reliable charging infrastructure, particularly fast charging stations (FCS) for addressing challenges such as range anxiety and long charging times. This paper presents the design and feasibility analysis of a grid-connected DC fast charging station for the Dhaka-Chittagong highway, a critical transportation corridor in Bangladesh. The proposed system incorporates advanced components, including a step-down transformer, Vienna Rectifier, and LC filter, to convert high-voltage AC power from the grid into a stable DC output. Simulated using MATLAB Simulink, the model delivers a peak output of 400V DC and 120 kW power, enabling rapid and efficient EV charging. The study also evaluates the system's performance, analyzing charging times, energy consumption, and distance ranges for representative EVs. By addressing key technical, environmental, and economic considerations, this paper provides a comprehensive roadmap for deploying fast charging infrastructure, fostering EV adoption, and advancing sustainable transportation in Bangladesh.
comment: Accepted to 4th IEEE-ECCE
Global linearization of asymptotically stable systems without hyperbolicity
We give a proof of an extension of the Hartman-Grobman theorem to nonhyperbolic but asymptotically stable equilibria of vector fields. Moreover, the linearizing topological conjugacy is (i) defined on the entire basin of attraction if the vector field is complete, and (ii) a $C^{k\geq 1}$-diffeomorphism on the complement of the equilibrium if the vector field is $C^k$ and the underlying space is not $5$-dimensional. We also show that the $C^k$ statement in the $5$-dimensional case is equivalent to the $4$-dimensional smooth Poincar\'{e} conjecture.
comment: 7 pages
Diffusion Predictive Control with Constraints
Diffusion models have become popular for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are stochastic and typically trained offline, limiting their ability to handle unseen and dynamic conditions where novel constraints not represented in the training data must be satisfied. To overcome this limitation, we propose diffusion predictive control with constraints (DPCC), an algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. DPCC incorporates model-based projections into the denoising process of a trained trajectory diffusion model and uses constraint tightening to account for model mismatch. This allows us to generate constraint-satisfying, dynamically feasible, and goal-reaching trajectories for predictive control. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
comment: Accepted to L4DC 2025. Code: https://github.com/ralfroemer99/dpcc. 14 pages, 3 figures, 3 tables
DeepCEE: Efficient Cross-Region Model Distributed Training System under Heterogeneous GPUs and Networks
Most existing training systems focus on a single region. In contrast, we envision that cross-region training offers more flexible GPU resource allocation and yields significant potential. However, the hierarchical cluster topology and unstable networks in the cloud-edge-end (CEE) environment, a typical cross-region scenario, pose substantial challenges to building an efficient and autonomous model training system. We propose DeepCEE, a geo-distributed model training system tailored for heterogeneous GPUs and networks in CEE environments. DeepCEE adopts a communication-centric design philosophy to tackle challenges arising from slow and unstable inter-region networks. It begins with a heterogeneous device profiler that identifies and groups devices based on both network and compute characteristics. Leveraging device groups, DeepCEE implements compact, zero-bubble pipeline parallelism, automatically deriving optimal parallel strategies. To further adapt to runtime variability, DeepCEE integrates a dynamic environment adapter that reacts to network fluctuations. Extensive evaluations demonstrate that DeepCEE achieves 1.3-2.8x higher training throughput compared to widely used and SOTA training systems.
Composable Uncertainty in Symmetric Monoidal Categories for Design Problems
Applied category theory often studies symmetric monoidal categories (SMCs) whose morphisms represent open systems. These structures naturally accommodate complex wiring patterns, leveraging (co)monoidal structures for splitting and merging wires, or compact closed structures for feedback. A key example is the compact closed SMC of design problems (DP), which enables a compositional approach to co-design in engineering. However, in practice, the systems of interest may not be fully known. Recently, Markov categories have emerged as a powerful framework for modeling uncertain processes. In this work, we demonstrate how to integrate this perspective into the study of open systems while preserving consistency with the underlying SMC structure. To this end, we employ the change-of-base construction for enriched categories, replacing the morphisms of a symmetric monoidal $\mathcal{V}$-category $\mathcal{C}$ with parametric maps $A \to \mathcal{C}(X,Y)$ in a Markov category induced by a symmetric monoidal monad. This results in a symmetric monoidal 2-category $N_*\mathcal{C}$ with the same objects as $\mathcal{C}$ and reparametrization 2-cells. By choosing different monads, we capture various types of uncertainty. The category underlying $\mathcal{C}$ embeds into $N_*\mathcal{C}$ via a strict symmetric monoidal functor, allowing (co)monoidal and compact closed structures to be transferred. Applied to DP, this construction leads to categories of practical relevance, such as parametrized design problems for optimization, and parametrized distributions of design problems for decision theory and Bayesian learning.
comment: 22 pages, 2 figures, accepted to Applied Category Theory 2025
Full LTL Synthesis over Infinite-state Arenas
Recently, interest has increased in applying reactive synthesis to richer-than-Boolean domains. A major (undecidable) challenge in this area is to establish when certain repeating behaviour terminates in a desired state when the number of steps is unbounded. Existing approaches struggle with this problem, or can handle at most deterministic games with B\"uchi goals. This work goes beyond by contributing the first effectual approach to synthesis with full LTL objectives, based on Boolean abstractions that encode both safety and liveness properties of the underlying infinite arena. We take a CEGAR approach: attempting synthesis on the Boolean abstraction, checking spuriousness of abstract counterstrategies through invariant checking, and refining the abstraction based on counterexamples. We reduce the complexity, when restricted to predicates, of abstracting and synthesising by an exponential through an efficient binary encoding. This also allows us to eagerly identify useful fairness properties. Our discrete synthesis tool outperforms the state-of-the-art on linear integer arithmetic (LIA) benchmarks from literature, solving almost double as many syntesis problems as the current state-of-the-art. It also solves slightly more problems than the second-best realisability checker, in one-third of the time. We also introduce benchmarks with richer objectives that other approaches cannot handle, and evaluate our tool on them.
comment: Extended version of identically-titled paper to be published in the proceedings of CAV 2025
Radar Network for Gait Monitoring: Technology and Validation
In recent years, radar-based devices have emerged as an alternative approach for gait monitoring. However, the radar configuration and the algorithms used to extract the gait parameters often differ between contributions, lacking a systematic evaluation of the most appropriate setup. Additionally, radar-based studies often exclude motorically impaired subjects, leaving it unclear whether the existing algorithms are applicable to such populations. In this paper, a radar network is developed and validated by monitoring the gait of five healthy individuals and three patients with Parkinson's disease. Six configurations and four algorithms were compared using Vicon as ground-truth to determine the most appropriate solution for gait monitoring. The best results were obtained using only three nodes: two oriented towards the feet and one towards the torso. The most accurate stride velocity and distance in the state of the art were obtained with this configuration. Moreover, we show that analyzing the feet velocity increases the reliability of the temporal parameters, especially with aged or motorically impaired subjects. The contribution is significant for the implementation of radar networks in clinical and domestic environments, as it addresses critical aspects concerning the radar network configuration and algorithms.
Designing RF-Powered Battery-Less Electronic Shelf Labels With COTS Components
This paper presents a preliminary study exploring the feasibility of designing batteryless electronic shelf labels (ESLs) powered by radio frequency wireless power transfer using commercial off-the-shelf components. The proposed ESL design is validated through a dedicated testbed and involves a detailed analysis of design choices, including energy consumption, energy conversion, and storage solutions. A leaded aluminium electrolytic capacitor is selected as the primary energy storage element, balancing cost and performance while maintaining compactness. Experimental evaluations demonstrate that an ESL can update its display within 4 to 120 minutes, depending on input power and RF frequency, with harvester efficiencies reaching up to 30 %. Challenges such as low harvester efficiency, extended update times, and hardware constraints are identified, highlighting opportunities for future optimizations. This work provides valuable insights into system design considerations for RF-powered ESLs and establishes a foundation for further research in energy-neutral Internet of Things applications.
comment: 12 pages, 5 figures and accepted for presentation at the IEEE Wireless Power Technology Conference and Expo
Enhanced Prediction Model for Time Series Characterized by GARCH via Interval Type-2 Fuzzy Inference System
GARCH-type time series (characterized by Generalized Autoregressive Conditional Heteroskedasticity) exhibit pronounced volatility, autocorrelation, and heteroskedasticity. To address these challenges and enhance predictive accuracy, this study introduces a hybrid forecasting framework that integrates the Interval Type-2 Fuzzy Inference System (IT2FIS) with the GARCH model. Leveraging the interval-based uncertainty representation of IT2FIS and the volatility-capturing capability of GARCH, the proposed model effectively mitigates the adverse impact of heteroskedasticity on prediction reliability. Specifically, the GARCH component estimates conditional variance, which is subsequently incorporated into the Gaussian membership functions of IT2FIS. This integration transforms IT2FIS into an adaptive variable-parameter system, dynamically aligning with the time-varying volatility of the target series. Through systematic parameter optimization, the framework not only captures intricate volatility patterns but also accounts for heteroskedasticity and epistemic uncertainties during modeling, thereby improving both prediction precision and model robustness. Experimental validation employs diverse datasets, including air quality concentration, urban traffic flow, and energy consumption. Comparative analyses are conducted against models: the GARCH-Takagi-Sugeno-Kang (GARCH-TSK) model, fixed-variance time series models, the GARCH-Gated Recurrent Unit (GARCH-GRU), and Long Short-Term Memory (LSTM) networks. The results indicate that the proposed model achieves superior predictive performance across the majority of test scenarios in error metrics. These findings underscore the effectiveness of hybrid approaches in forecasting uncertainty for GARCH-type time series, highlighting their practical utility in real-world time series forecasting applications.
comment: 40 pages, 13 figures, references added
Robustness to Model Approximation, Model Learning From Data, and Sample Complexity in Wasserstein Regular MDPs
The paper studies the robustness properties of discrete-time stochastic optimal control under Wasserstein model approximation for both discounted cost and average cost criteria. Specifically, we study the performance loss when applying an optimal policy designed for an approximate model to the true dynamics compared with the optimal cost for the true model under the sup-norm-induced metric, and relate it to the Wasserstein-1 distance between the approximate and true transition kernels. A primary motivation of this analysis is empirical model learning, as well as empirical noise distribution learning, where Wasserstein convergence holds under mild conditions but stronger convergence criteria, such as total variation, may not. We discuss applications of the results to the disturbance estimation problem, where sample complexity bounds are given, and also to a general empirical model learning approach, obtained under either Markov or i.i.d. learning settings.
comment: 36 pages
Policy Design for Two-sided Platforms with Participation Dynamics ICML2025
In two-sided platforms (e.g., video streaming or e-commerce), viewers and providers engage in interactive dynamics: viewers benefit from increases in provider populations, while providers benefit from increases in viewer population. Despite the importance of such "population effects" on long-term platform health, recommendation policies do not generally take the participation dynamics into account. This paper thus studies the dynamics and recommender policy design on two-sided platforms under the population effects for the first time. Our control- and game-theoretic findings warn against the use of the standard "myopic-greedy" policy and shed light on the importance of provider-side considerations (i.e., effectively distributing exposure among provider groups) to improve social welfare via population growth. We also present a simple algorithm to optimize long-term social welfare by taking the population effects into account, and demonstrate its effectiveness in synthetic and real-data experiments. Our experiment code is available at https://github.com/sdean-group/dynamics-two-sided-market.
comment: ICML2025
A Concentration Bound for TD(0) with Function Approximation
We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.
comment: Submitted to Stochastic Systems
Improving Generative Inverse Design of Rectangular Patch Antennas with Test Time Optimization
We propose a two-stage deep learning framework for the inverse design of rectangular patch antennas. Our approach leverages generative modeling to learn a latent representation of antenna frequency response curves and conditions a subsequent generative model on these responses to produce feasible antenna geometries. We further demonstrate that leveraging search and optimization techniques at test-time improves the accuracy of the generated designs and enables consideration of auxiliary objectives such as manufacturability. Our approach generalizes naturally to different design criteria, and can be easily adapted to more complex geometric design spaces.
comment: Code and dataset available at https://github.com/becklabs/patch-antenna-tto
Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation
This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of observer error, it is difficult to analyze the convergence and stability of most existing output feedback learning-based control methods. To tackle these issues, we propose a generalized dynamic output feedback learning control approach with guaranteed convergence, stability, and optimality performance for solving the LQR problem of unknown discrete-time linear systems. In particular, a dynamic output feedback controller is designed to be equivalent to a state feedback controller. This equivalence relationship is an inherent property without requiring convergence of the estimated state by the state observer, which plays a key role in establishing the off-policy learning control approaches. By value iteration and policy iteration schemes, the adaptive dynamic programming based learning control approaches are developed to estimate the optimal feedback control gain. In addition, a model-free stability criterion is provided by finding a nonsingular parameterization matrix, which contributes to establishing a switched iteration scheme. Furthermore, the convergence, stability, and optimality analyses of the proposed output feedback learning control approaches are given. Finally, the theoretical results are validated by two numerical examples.
comment: 16 pages, 5 figures
Learning Enhanced Ensemble Filters
The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. This shortcoming is addressed by approximating the mean-field evolution using a novel form of neural operator taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. The derivation of methods from a mean-field formulation allows a single parameterization of the algorithm to be deployed at different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root mean-square-error performance relative to leading methods in filtering the Lorenz 96 and Kuramoto-Sivashinsky models.
comment: Preprint submitted to Journal of Computational Physics
Outlier-Robust Linear System Identification Under Heavy-tailed Noise
We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., "light-tailed". In sharp contrast, we work under a significantly weaker noise model, assuming nothing more than the existence of the fourth moment of the noise distribution. For this setting, we provide the first set of results demonstrating that one can obtain sample-complexity bounds for linear system identification that are nearly of the same order as under sub-Gaussian noise. To achieve such results, we develop a novel robust system identification algorithm that relies on constructing multiple weakly-concentrated estimators, and then boosting their performance using suitable tools from high-dimensional robust statistics. Interestingly, our analysis reveals how the kurtosis of the noise distribution, a measure of heavy-tailedness, affects the number of trajectories needed to achieve desired estimation error bounds. Finally, we show that our algorithm and analysis technique can be easily extended to account for scenarios where an adversary can arbitrarily corrupt a small fraction of the collected trajectory data. Our work takes the first steps towards building a robust statistical learning theory for control under non-ideal assumptions on the data-generating process.
Polynomial Chaos Expanded Gaussian Process
In complex and unknown processes, global models are initially generated over the entire experimental space but often fail to provide accurate predictions in local areas. A common approach is to use local models, which requires partitioning the experimental space and training multiple models, adding significant complexity. Recognizing this limitation, this study addresses the need for models that effectively represent both global and local experimental spaces. It introduces a novel machine learning (ML) approach: Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion (PCE) to calculate input-dependent hyperparameters of the Gaussian process (GP). This provides a mathematically interpretable approach that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP, highlighting model performance that is often competitive with or better than previous methods. A key advantage of the presented model is its interpretable hyperparameters along with training and prediction runtimes comparable to those of a GP.
comment: Manuscript: 38 pages, 12 figures, 11 tables
Chance-Constrained Sampling-Based MPC for Collision Avoidance in Uncertain Dynamic Environments
Navigating safely in dynamic and uncertain environments is challenging due to uncertainties in perception and motion. This letter presents the Chance-Constrained Unscented Model Predictive Path Integral (C2U-MPPI) framework, a robust sampling-based Model Predictive Control (MPC) algorithm that addresses these challenges by leveraging the U-MPPI control strategy with integrated probabilistic chance constraints, enabling more reliable and efficient navigation under uncertainty. Unlike gradient-based MPC methods, our approach (i) avoids linearization of system dynamics by directly applying non-convex and nonlinear chance constraints, enabling more accurate and flexible optimization, and (ii) enhances computational efficiency by leveraging a deterministic form of probabilistic constraints and employing a layered dynamic obstacle representation, enabling real-time handling of multiple obstacles. Extensive experiments in simulated and real-world human-shared environments validate the effectiveness of our algorithm against baseline methods, showcasing its capability to generate feasible trajectories and control inputs that adhere to system dynamics and constraints in dynamic settings, enabled by unscented-based sampling strategy and risk-sensitive trajectory evaluation. A supplementary video is available at: https://youtu.be/FptAhvJlQm8.
comment: This paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L), May 2025. It comprises 9 pages, 3 figures, and 7 tables
Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks
We provide a communication- and computation-efficient method for distributed submodular optimization in robot mesh networks. Submodularity is a property of diminishing returns that arises in active information gathering such as mapping, surveillance, and target tracking. Our method, Resource-Aware distributed Greedy (RAG), introduces a new distributed optimization paradigm that enables scalable and near-optimal action coordination. To this end, RAG requires each robot to make decisions based only on information received from and about their neighbors. In contrast, the current paradigms allow the relay of information about all robots across the network. As a result, RAG's decision-time scales linearly with the network size, while state-of-the-art near-optimal submodular optimization algorithms scale cubically. We also characterize how the designed mesh-network topology affects RAG's approximation performance. Our analysis implies that sparser networks favor scalability without proportionally compromising approximation performance: while RAG's decision time scales linearly with network size, the gain in approximation performance scales sublinearly. We demonstrate RAG's performance in simulated scenarios of area detection with up to 45 robots, simulating realistic robot-to-robot (r2r) communication speeds such as the 0.25 Mbps speed of the Digi XBee 3 Zigbee 3.0. In the simulations, RAG enables real-time planning, up to three orders of magnitude faster than competitive near-optimal algorithms, while also achieving superior mean coverage performance. To enable the simulations, we extend the high-fidelity and photo-realistic simulator AirSim by integrating a scalable collaborative autonomy pipeline to tens of robots and simulating r2r communication delays. Our code is available at https://github.com/UM-iRaL/Resource-Aware-Coordination-AirSim.
comment: Accepted to IEEE Transactions on Robotics
Moving Obstacle Collision Avoidance via Chance-Constrained MPC with CBF
Model predictive control (MPC) with control barrier functions (CBF) is a promising solution to address the moving obstacle collision avoidance (MOCA) problem. Unlike MPC with distance constraints (MPC-DC), this approach facilitates early obstacle avoidance without the need to increase prediction horizons. However, the existing MPC-CBF method is deterministic and fails to account for perception uncertainties. This paper proposes a generalized MPC-CBF approach for stochastic scenarios, which maintains the advantages of the deterministic method for addressing the MOCA problem. Specifically, the chance-constrained MPC-CBF (CC-MPC-CBF) technique is introduced to ensure that a user-defined collision avoidance probability is met by utilizing probabilistic CBFs. However, due to the potential empty intersection between the reachable set and the safe region confined by CBF constraints, the CC-MPC-CBF problem can pose challenges in achieving feasibility. To address this issue, we propose a sequential implementation approach that involves solving a standard MPC optimization problem followed by a predictive safety filter optimization, which leads to improved feasibility. Furthermore, we introduce an iterative convex optimization scheme to further expedite the resolution of the predictive safety filter, which results in an efficient approach to tackling the non-convex CC-MPC-CBF problem. We apply our proposed algorithm to a double integrator system for MOCA, and we showcase its resilience to obstacle measurement uncertainties and favorable feasibility properties.
Systems and Control (EESS)
Quasi Steady-State Frequency
Accurate frequency estimation is critical for the control, monitoring and protection of electrical power systems, in particular, of systems with a high penetration of power electronics. This paper introduces the novel concept of Quasi Steady-State (QSS) frequency as a quantity that fills the gap between stationary and instantaneous frequency. QSS frequency coincides with the fundamental frequency of an AC voltage in any stationary conditions, including unbalanced and non-sinusoidal, and is able to capture the time-varying fundamental frequency in transient conditions. The paper also proposes a metric borrowed from fluid dynamics, namely, the time derivative of the circulation, to define the scope of validity of the QSS frequency. Analytical examples as well as a case study based on a fully-fledged EMT model of the IEEE 39-bus system serve to illustrate, respectively, the properties of the QSS frequency and its behavior in transient conditions.
WiCAL: Accurate Wi-Fi-Based 3D Localization Enabled by Collaborative Antenna Arrays
Accurate 3D localization is essential for realizing advanced sensing functionalities in next-generation Wi-Fi communication systems. This study investigates the potential of multistatic localization in Wi-Fi networks through the deployment of multiple cooperative antenna arrays. The collaborative gain offered by these arrays is twofold: (i) intra-array coherent gain at the wavelength scale among antenna elements, and (ii) inter-array cooperative gain across arrays. To evaluate the feasibility and performance of this approach, we develop WiCAL (Wi-Fi Collaborative Antenna Localization), a system built upon commercial Wi-Fi infrastructure equipped with uniform rectangular arrays. These arrays are driven by multiplexing embedded radio frequency chains available in standard access points or user devices, thereby eliminating the need for sophisticated, costly, and power-hungry multi-transceiver modules typically required in multiple-input and multiple-output systems. To address phase offsets introduced by RF chain multiplexing, we propose a three-stage, fine-grained phase alignment scheme to synchronize signals across antenna elements within each array. A bidirectional spatial smoothing MUSIC algorithm is employed to estimate angles of arrival (AoAs) and mitigate performance degradation caused by correlated interference. To further exploit inter-array cooperative gain, we elaborate on the synchronization mechanism among distributed URAs, which enables direct position determination by bypassing intermediate angle estimation. Once synchronized, the distributed URAs effectively form a virtual large-scale array, significantly enhancing spatial resolution and localization accuracy.
comment: 14 page, 22 figures
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
We extend the generalized global Hartman-Grobman theorem by Kvalheim and Sontag for flows to a case of asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277, 3 pages, comments are welcome
Distributed equilibrium seeking in aggregative games: linear convergence under singular perturbations lens
We present a fully-distributed algorithm for Nash equilibrium seeking in aggregative games over networks. The proposed scheme endows each agent with a gradient-based scheme equipped with a tracking mechanism to locally reconstruct the aggregative variable, which is not available to the agents. We show that our method falls into the framework of singularly perturbed systems, as it involves the interconnection between a fast subsystem - the global information reconstruction dynamics - with a slow one concerning the optimization of the local strategies. This perspective plays a key role in analyzing the scheme with a constant stepsize, and in proving its linear convergence to the Nash equilibrium in strongly monotone games with local constraints. By exploiting the flexibility of our aggregative variable definition (not necessarily the arithmetic average of the agents' strategy), we show the efficacy of our algorithm on a realistic voltage support case study for the smart grid.
comment: Presented at the 2024 IEEE 63rd Conference on Decision and Control (CDC), Milan, Italy. Accepted manuscript version, 7 pages. arXiv admin note: text overlap with arXiv:2210.14547
Stochastic Geometry-Based Performance Evaluation for LEO Satellite-Assisted Space Caching
To achieve the Internet of Things (IoT) vision,Mobile Edge Computing (MEC) is a promising technology aimed at providing low-latency computing services to user equipment (UE). However, terrestrial MEC network struggles to provide service to UEs in remote and maritime region. Low Earth Orbit (LEO) satellite networks have the potential to overcome geographical restrictions and provide seamless global coverage for UEs. In this paper, we provide the first attempt to use stochastic geometry to investigate the performance of implementing space caching with LEO satellites (SATs) in the MEC network. We study a LEO satellite-assisted space caching MEC network, and LEO SATs can be equipped with servers to enable space caching, with the advantage of seamless coverage to assist terrestrial CSs for serving UEs in remote or maritime reigon. Using stochastic geometry and queuing theory, we establish an analytical framework for this MEC network. Meanwhile, we develop association strategies for UEs to connect with LEO SATs or CSs and utilize stochastic geometry to derive uplink and downlink coverage probabilities, considering the diversity of task and service types. On this basis, we employ the queuing theory to calculate the average delay to evaluate the system performance. Through Monte Carlo simulations and numerical results, the system performance is evaluated. The results show the potential of SAT spatial caching in improving the performance of the MEC network. Additionally, our results reveal useful insights such as the significant impact of the altitude and number of LEO SATs on the average delay of the network, providing helpful system-level recommendations for the design and configuration of the space-caching MEC network.
comment: 15 pages, 12 figures, be accepted by IEEE IoTJ
Active Learning-Enhanced Dual Control for Angle-Only Initial Relative Orbit Determination
Accurate relative orbit determination is a key challenge in modern space operations, particularly when relying on angle-only measurements. The inherent observability limitations of this approach make initial state estimation difficult, impacting mission safety and performance. This work explores the use of active learning (AL) techniques to enhance observability by dynamically designing the input excitation signal offline and at runtime. Our approach leverages AL to design the input signal dynamically, enhancing the observability of the system without requiring additional hardware or predefined maneuvers. We incorporate a dual control technique to ensure target tracking while maintaining observability. The proposed method is validated through numerical simulations, demonstrating its effectiveness in estimating the initial relative state of the chaser and target spacecrafts and its robustness to various initial relative distances and observation periods.
Output Regulation of Linear Systems with Non-periodic Non-smooth Exogenous Signals
We address the output regulation problem of linear systems with non-smooth and non-periodic exogenous signals. Specifically, we first formulate and solve the full-information problem by designing a state-feedback controller. We study the solvability of the regulator equations, providing a new non-resonance condition. We then focus on the error-feedback problem, for which we design a (non-robust) internal model leveraging the concept of canonical realisation and applying a high-gain method for the stabilisation of the closed-loop system under the minimum-phase assumption. Finally, we study the regulation problem involving model parameter uncertainties. Drawing ideas from both hybrid and time-varying (smooth) output regulation, we propose two methods to establish an internal model that is robust to uncertainties. The first method is an extension of the hybrid internal model, while the second relies on a new concept of immersion. In this non-smooth case, the immersion is established based on integrals rather than derivatives. The effectiveness of the proposed solutions is illustrated by a circuit regulation example.
comment: 18 pages, 6 figures
A domain adaptation neural network for digital twin-supported fault diagnosis
Digital twins offer a promising solution to the lack of sufficient labeled data in deep learning-based fault diagnosis by generating simulated data for model training. However, discrepancies between simulation and real-world systems can lead to a significant drop in performance when models are applied in real scenarios. To address this issue, we propose a fault diagnosis framework based on Domain-Adversarial Neural Networks (DANN), which enables knowledge transfer from simulated (source domain) to real-world (target domain) data. We evaluate the proposed framework using a publicly available robotics fault diagnosis dataset, which includes 3,600 sequences generated by a digital twin model and 90 real sequences collected from physical systems. The DANN method is compared with commonly used lightweight deep learning models such as CNN, TCN, Transformer, and LSTM. Experimental results show that incorporating domain adaptation significantly improves the diagnostic performance. For example, applying DANN to a baseline CNN model improves its accuracy from 70.00% to 80.22% on real-world test data, demonstrating the effectiveness of domain adaptation in bridging the sim-to-real gap.
comment: Preprint accepted by ICCAD 2025 at Barcelona
Multi-Mode Process Control Using Multi-Task Inverse Reinforcement Learning
In the era of Industry 4.0 and smart manufacturing, process systems engineering must adapt to digital transformation. While reinforcement learning offers a model-free approach to process control, its applications are limited by the dependence on accurate digital twins and well-designed reward functions. To address these limitations, this paper introduces a novel framework that integrates inverse reinforcement learning (IRL) with multi-task learning for data-driven, multi-mode control design. Using historical closed-loop data as expert demonstrations, IRL extracts optimal reward functions and control policies. A latent-context variable is incorporated to distinguish modes, enabling the training of mode-specific controllers. Case studies on a continuous stirred tank reactor and a fed-batch bioreactor validate the effectiveness of this framework in handling multi-mode data and training adaptable controllers.
Efficient Spectral Control of Partially Observed Linear Dynamical Systems
We propose a new method for the problem of controlling linear dynamical systems under partial observation and adversarial disturbances. Our new algorithm, Double Spectral Control (DSC), matches the best known regret guarantees while exponentially improving runtime complexity over previous approaches in its dependence on the system's stability margin. Our key innovation is a two-level spectral approximation strategy, leveraging double convolution with a universal basis of spectral filters, enabling efficient and accurate learning of the best linear dynamical controllers.
Analysis of Joint Radar and Communication in Disaster Scenarios
With the increasing frequency and intensity of natural disasters, there is a necessity for advanced technologies that can provide reliable situational awareness and communication. Conventional systems are often inadequate due to unreliable infrastructure, power grid failures, high investment costs and scalability challenges. This paper explores the potential of ad-hoc mesh joint radar and communication (JRC) networks as a scalable, resilient, energy-efficient solution for disaster management that can operate independently of conventional infrastructure. The proposed JRC network enhances disaster response by integrating target detection (such as identifying vital signs, hazardous leaks, and fires) with communication capabilities to ensure efficient information dissemination under intense clutter conditions. Key performance metrics, including data rate, Signal-to-Clutter and Noise Ratio (SCNR), probability of detection, and false alarm rate, are used to assess performance. An optimization approach is proposed to provide an energy-efficient resource allocation scheme. The results show the performance of ad-hoc mesh JRC systems, underscoring their potential to enhance disaster management efforts by addressing unique operational challenges.
comment: 6 pages
Uncertainty Partitioning with Probabilistic Feasibility and Performance Guarantees for Chance-Constrained Optimization
We propose a novel distribution-free scheme to solve optimization problems where the goal is to minimize the expected value of a cost function subject to probabilistic constraints. Unlike standard sampling-based methods, our idea consists of partitioning the uncertainty domain in a user-defined number of sets, enabling more flexibility in the trade-off between conservatism and computational complexity. We provide sufficient conditions to ensure that our approximated problem is feasible for the original stochastic program, in terms of chance constraint satisfaction. In addition, we perform a rigorous performance analysis, by quantifying the distance between the optimal values of the original and the approximated problem. We show that our approach is tractable for optimization problems that include model predictive control of piecewise affine systems, and we demonstrate the benefits of our approach, in terms of the trade-off between conservatism and computational complexity, on a numerical example.
comment: Submitted to IEEE Transactions on Automatic Control
COM Adjustment Mechanism Control for Multi-Configuration Motion Stability of Unmanned Deformable Vehicle
An unmanned deformable vehicle is a wheel-legged robot transforming between two configurations: vehicular and humanoid states, with different motion modes and stability characteristics. To address motion stability in multiple configurations, a center-of-mass adjustment mechanism was designed. Further, a motion stability hierarchical control algorithm was proposed, and an electromechanical model based on a two-degree-of-freedom center-of-mass adjustment mechanism was established. An unmanned-deformable-vehicle vehicular-state steady-state steering dynamics model and a gait planning kinematic model of humanoid state walking were established. A stability hierarchical control strategy was designed to realize the stability control. The results showed that the steady-state steering stability in vehicular state and the walking stability in humanoid state could be significantly improved by controlling the slider motion.
Research on a Two-Layer Demand Response Framework for Electric Vehicle Users and Aggregators Based on LLMs
The widespread adoption of electric vehicles (EVs) has increased the importance of demand response in smart grids. This paper proposes a two-layer demand response optimization framework for EV users and aggregators, leveraging large language models (LLMs) to balance electricity supply and demand and optimize energy utilization during EV charging. The upper-layer model, focusing on the aggregator, aims to maximize profits by adjusting retail electricity prices. The lower-layer model targets EV users, using LLMs to simulate charging demands under varying electricity prices and optimize both costs and user comfort. The study employs a multi-threaded LLM decision generator to dynamically analyze user behavior, charging preferences, and psychological factors. The framework utilizes the PSO method to optimize electricity prices, ensuring user needs are met while increasing aggregator profits. Simulation results show that the proposed model improves EV charging efficiency, alleviates peak power loads, and stabilizes smart grid operations.
Effective Fixed-Time Control for Constrained Nonlinear System
In this paper, we tackle the state transformation problem in non-strict full state-constrained systems by introducing an adaptive fixed-time control method, utilizing a one-to-one asymmetric nonlinear mapping auxiliary system. Additionally, we develop a class of multi-threshold event-triggered control strategies that facilitate autonomous controller updates, substantially reducing communication resource consumption. Notably, the self-triggered strategy distinguishes itself from other strategies by obviating the need for continuous real-time monitoring of the controller's state variables. By accurately forecasting the subsequent activation instance, this strategy significantly optimizes the efficiency of the control system. Moreover, our theoretical analysis demonstrates that the semi-global practical fixed-time stability (SPFTS) criterion guarantees both tracking accuracy and closed-loop stability under state constraints, with convergence time independent of initial conditions. Finally, simulation results reveal that the proposed method significantly decreases the frequency of control command updates while maintaining tracking accuracy.
Collision-free Control Barrier Functions for General Ellipsoids via Separating Hyperplane
This paper presents a novel collision avoidance method for general ellipsoids based on control barrier functions (CBFs) and separating hyperplanes. First, collision-free conditions for general ellipsoids are analytically derived using the concept of dual cones. These conditions are incorporated into the CBF framework by extending the system dynamics of controlled objects with separating hyperplanes, enabling efficient and reliable collision avoidance. The validity of the proposed collision-free CBFs is rigorously proven, ensuring their effectiveness in enforcing safety constraints. The proposed method requires only single-level optimization, significantly reducing computational time compared to state-of-the-art methods. Numerical simulations and real-world experiments demonstrate the effectiveness and practicality of the proposed algorithm.
Physics-Informed Neural Network for Cross-Domain Predictive Control of Tapered Amplifier Thermal Stabilization
Thermally induced laser noise poses a critical limitation to the sensitivity of quantum sensor arrays employing ultra-stable amplified lasers, primarily stemming from nonlinear gain-temperature coupling effects in tapered amplifiers (TAs). To address this challenge, we present a robust intelligent control strategy that synergistically integrates an encoder-decoder physics-informed gated recurrent unit (PI-GRU) network with a model predictive control (MPC) framework. Our methodology incorporates physical soft constraints into the neural network architecture, yielding a predictive model with enhanced physical consistency that demonstrates robust extrapolation capabilities beyond the training data distribution. Leveraging the PI-GRU model's accurate multi-step predictive performance, we implement a hierarchical parallel MPC architecture capable of real-time thermal instability compensation. This hybrid approach achieves cross-domain consistent thermal stabilization in TAs under diverse laser power operations. Remarkably, while trained exclusively on low-power operational data, our system demonstrates exceptional generalization, improving prediction accuracy by 58.2% and temperature stability by 69.1% in previously unseen high-power operating regimes, as experimentally validated. The novel synchronization of physics-informed neural networks with advanced MPC frameworks presented in this work establishes a groundbreaking paradigm for addressing robustness challenges in cross-domain predictive control applications, overcoming conventional modeling limitations.
Data-Driven Existence and Design of Target Output Controllers
Target output controllers aim at regulating a system's target outputs by placing poles of a suitable subsystem using partial state feedback, where full state controllability is not required. This paper establishes existence conditions for such controllers using input and partial state data, where the system dynamics are unknown. The approach bypasses traditional system identification steps and leverages the intrinsic structure of historical data to certify controller existence and synthesize a suitable feedback gain. Analytical characterizations are provided, ensuring that the resulting closed-loop system satisfies desired performance objectives such as pole placement or stabilization. Data-driven algorithms are then proposed to design target output controllers directly from data without identifying system parameters, where controllers with the order matching the number of target outputs and with minimum-order augmented target outputs are both addressed. Furthermore, a separation principle is revealed, decoupling the design of target output controllers from state observers. This enables the development of data-driven observer-based controllers that integrate estimation and control. Numerical examples validate the theoretical results and demonstrate the efficacy of the proposed approach.
On Kernel Design for Regularized Volterra Series Identification of Wiener-Hammerstein Systems
There have been increasing interests on the Volterra series identification with the kernel-based regularization method. The major difficulties are on the kernel design and efficiency of the corresponding implementation. In this paper, we first assume that the underlying system to be identified is the Wiener-Hammerstein (WH) system with polynomial nonlinearity. We then show how to design kernels with nonzero off-diagonal blocks for Volterra maps by taking into account the prior knowledge of the linear blocks and the structure of WH systems. Moreover, exploring the structure of the designed kernels leads to the same computational complexity as the state-of-the-art result, i.e., $O(N^3)$, where $N$ is the sample size, but with a significant difference that the proposed kernels are designed in a direct and flexible way. In addition, for a special case of the kernel and a class of widely used input signals, further exploring the separable structure of the output kernel matrix can lower the computational complexity from $O(N^3)$ to $O(N\gamma^2)$, where $\gamma$ is the separability rank of the output kernel matrix and can be much smaller than $N$. We finally run Monte Carlo simulations to demonstrate the proposed kernels and the obtained theoretical results.
comment: 17 pages, 7 figures
Least Squares Model Reduction: A Two-Stage System-Theoretic Interpretation
Model reduction simplifies complex dynamical systems while preserving essential properties. This paper revisits a recently proposed system-theoretic framework for least squares moment matching. It interprets least squares model reduction in terms of two steps process: constructing a surrogate model to satisfy interpolation constraints, then projecting it onto a reduced-order space. Using tools from output regulation theory and Krylov projections, this approach provides a new view on classical methods. For illustration, we reexamine the least-squares model reduction method by Lucas and Smith, offering new insights into its structure.
comment: 13th IFAC Symposium on Nonlinear Control Systems. arXiv admin note: substantial text overlap with arXiv:2110.06072
Learning-Based Tracking Perimeter Control for Two-region Macroscopic Traffic Dynamics
Leveraging the concept of the macroscopic fundamental diagram (MFD), perimeter control can alleviate network-level congestion by identifying critical intersections and regulating them effectively. Considering the time-varying nature of travel demand and the equilibrium of accumulation state, we extend the conventional set-point perimeter control (SPC) problem for the two-region MFD system as an optimal tracking perimeter control problem (OTPCP). Unlike the SPC schemes that stabilize the traffic dynamics to the desired equilibrium point, the proposed tracking perimeter control (TPC) scheme regulates the traffic dynamics to a desired trajectory in a differential framework. Due to the inherent network uncertainties, such as heterogeneity of traffic dynamics and demand disturbance, the system dynamics could be uncertain or even unknown. To address these issues, we propose an adaptive dynamic programming (ADP) approach to solving the OTPCP without utilizing the well-calibrated system dynamics. Numerical experiments demonstrate the effectiveness of the proposed ADP-based TPC. Compared with the SPC scheme, the proposed TPC scheme achieves a 20.01% reduction in total travel time and a 3.15% improvement in cumulative trip completion. Moreover, the proposed adaptive TPC approach can regulate the accumulation state under network uncertainties and demand disturbances to the desired time-varying equilibrium trajectory that aims to maximize the trip completion under a nominal demand pattern. These results validate the robustness of the adaptive TPC approach.
Convergent Anthropocene Systems-of-Systems: Overcoming the Limitations of System Dynamics with Hetero-functional Graph Theory
Understanding the complexity and interdependence of systems in the Anthropocene is essential for making informed decisions about societal challenges spanning geophysical, biophysical, sociocultural, and sociotechnical domains. This paper explores the potential of Hetero-functional Graph Theory (HFGT) as a quantification tool for converting Model-based Systems Engineering (MBSE), stated in the Systems Modeling Language (SysML), into dynamic simulations-offering a comprehensive alternative to System Dynamics (SD) for representing interdependent systems of systems in the Anthropocene. The two approaches are compared in terms of systems thinking abstractions, methodological flexibility, and their ability to represent dynamic, multi-functional systems. Through a comparative study, the Mono Lake system is simulated in Northern California using both SD, and MBSE and HFGT, to highlight technical, conceptual and analytical differences. The simulations show equivalent results. However, MBSE and HFGT provide distinct advantages in capturing the nuances of the system through a broader set of systems thinking abstractions and in managing adaptive, multi-functional system interactions. These strengths position MBSE and HFGT as a powerful and flexible approach for representing, modeling, analyzing, and simulating heterogeneous and complex systems-of-systems in the Anthropocene.
Power-Capping Metric Evaluation for Improving Energy Efficiency SP
With high-performance computing systems now running at exascale, optimizing power-scaling management and resource utilization has become more critical than ever. This paper explores runtime power-capping optimizations that leverage integrated CPU-GPU power management on architectures like the NVIDIA GH200 superchip. We evaluate energy-performance metrics that account for simultaneous CPU and GPU power-capping effects by using two complementary approaches: speedup-energy-delay and a Euclidean distance-based multi-objective optimization method. By targeting a mostly compute-bound exascale science application, the Locally Self-Consistent Multiple Scattering (LSMS), we explore challenging scenarios to identify potential opportunities for energy savings in exascale applications, and we recognize that even modest reductions in energy consumption can have significant overall impacts. Our results highlight how GPU task-specific dynamic power-cap adjustments combined with integrated CPU-GPU power steering can improve the energy utilization of certain GPU tasks, thereby laying the groundwork for future adaptive optimization strategies.
comment: 14 pages, 3 figures, 2 tables. Accepted at the Energy Efficiency with Sustainable Performance: Techniques, Tools, and Best Practices, EESP Workshop, in conjunction with ISC High Performance 2025
Design and Analysis of a Grid-connected DC Fast Charging Station for Dhaka-Chittagong Highway
The growing adoption of electric vehicles (EVs) necessitates the development of efficient and reliable charging infrastructure, particularly fast charging stations (FCS) for addressing challenges such as range anxiety and long charging times. This paper presents the design and feasibility analysis of a grid-connected DC fast charging station for the Dhaka-Chittagong highway, a critical transportation corridor in Bangladesh. The proposed system incorporates advanced components, including a step-down transformer, Vienna Rectifier, and LC filter, to convert high-voltage AC power from the grid into a stable DC output. Simulated using MATLAB Simulink, the model delivers a peak output of 400V DC and 120 kW power, enabling rapid and efficient EV charging. The study also evaluates the system's performance, analyzing charging times, energy consumption, and distance ranges for representative EVs. By addressing key technical, environmental, and economic considerations, this paper provides a comprehensive roadmap for deploying fast charging infrastructure, fostering EV adoption, and advancing sustainable transportation in Bangladesh.
comment: Accepted to 4th IEEE-ECCE
Global linearization of asymptotically stable systems without hyperbolicity
We give a proof of an extension of the Hartman-Grobman theorem to nonhyperbolic but asymptotically stable equilibria of vector fields. Moreover, the linearizing topological conjugacy is (i) defined on the entire basin of attraction if the vector field is complete, and (ii) a $C^{k\geq 1}$-diffeomorphism on the complement of the equilibrium if the vector field is $C^k$ and the underlying space is not $5$-dimensional. We also show that the $C^k$ statement in the $5$-dimensional case is equivalent to the $4$-dimensional smooth Poincar\'{e} conjecture.
comment: 7 pages
Diffusion Predictive Control with Constraints
Diffusion models have become popular for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are stochastic and typically trained offline, limiting their ability to handle unseen and dynamic conditions where novel constraints not represented in the training data must be satisfied. To overcome this limitation, we propose diffusion predictive control with constraints (DPCC), an algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. DPCC incorporates model-based projections into the denoising process of a trained trajectory diffusion model and uses constraint tightening to account for model mismatch. This allows us to generate constraint-satisfying, dynamically feasible, and goal-reaching trajectories for predictive control. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
comment: Accepted to L4DC 2025. Code: https://github.com/ralfroemer99/dpcc. 14 pages, 3 figures, 3 tables
DeepCEE: Efficient Cross-Region Model Distributed Training System under Heterogeneous GPUs and Networks
Most existing training systems focus on a single region. In contrast, we envision that cross-region training offers more flexible GPU resource allocation and yields significant potential. However, the hierarchical cluster topology and unstable networks in the cloud-edge-end (CEE) environment, a typical cross-region scenario, pose substantial challenges to building an efficient and autonomous model training system. We propose DeepCEE, a geo-distributed model training system tailored for heterogeneous GPUs and networks in CEE environments. DeepCEE adopts a communication-centric design philosophy to tackle challenges arising from slow and unstable inter-region networks. It begins with a heterogeneous device profiler that identifies and groups devices based on both network and compute characteristics. Leveraging device groups, DeepCEE implements compact, zero-bubble pipeline parallelism, automatically deriving optimal parallel strategies. To further adapt to runtime variability, DeepCEE integrates a dynamic environment adapter that reacts to network fluctuations. Extensive evaluations demonstrate that DeepCEE achieves 1.3-2.8x higher training throughput compared to widely used and SOTA training systems.
Composable Uncertainty in Symmetric Monoidal Categories for Design Problems
Applied category theory often studies symmetric monoidal categories (SMCs) whose morphisms represent open systems. These structures naturally accommodate complex wiring patterns, leveraging (co)monoidal structures for splitting and merging wires, or compact closed structures for feedback. A key example is the compact closed SMC of design problems (DP), which enables a compositional approach to co-design in engineering. However, in practice, the systems of interest may not be fully known. Recently, Markov categories have emerged as a powerful framework for modeling uncertain processes. In this work, we demonstrate how to integrate this perspective into the study of open systems while preserving consistency with the underlying SMC structure. To this end, we employ the change-of-base construction for enriched categories, replacing the morphisms of a symmetric monoidal $\mathcal{V}$-category $\mathcal{C}$ with parametric maps $A \to \mathcal{C}(X,Y)$ in a Markov category induced by a symmetric monoidal monad. This results in a symmetric monoidal 2-category $N_*\mathcal{C}$ with the same objects as $\mathcal{C}$ and reparametrization 2-cells. By choosing different monads, we capture various types of uncertainty. The category underlying $\mathcal{C}$ embeds into $N_*\mathcal{C}$ via a strict symmetric monoidal functor, allowing (co)monoidal and compact closed structures to be transferred. Applied to DP, this construction leads to categories of practical relevance, such as parametrized design problems for optimization, and parametrized distributions of design problems for decision theory and Bayesian learning.
comment: 22 pages, 2 figures, accepted to Applied Category Theory 2025
Full LTL Synthesis over Infinite-state Arenas
Recently, interest has increased in applying reactive synthesis to richer-than-Boolean domains. A major (undecidable) challenge in this area is to establish when certain repeating behaviour terminates in a desired state when the number of steps is unbounded. Existing approaches struggle with this problem, or can handle at most deterministic games with B\"uchi goals. This work goes beyond by contributing the first effectual approach to synthesis with full LTL objectives, based on Boolean abstractions that encode both safety and liveness properties of the underlying infinite arena. We take a CEGAR approach: attempting synthesis on the Boolean abstraction, checking spuriousness of abstract counterstrategies through invariant checking, and refining the abstraction based on counterexamples. We reduce the complexity, when restricted to predicates, of abstracting and synthesising by an exponential through an efficient binary encoding. This also allows us to eagerly identify useful fairness properties. Our discrete synthesis tool outperforms the state-of-the-art on linear integer arithmetic (LIA) benchmarks from literature, solving almost double as many syntesis problems as the current state-of-the-art. It also solves slightly more problems than the second-best realisability checker, in one-third of the time. We also introduce benchmarks with richer objectives that other approaches cannot handle, and evaluate our tool on them.
comment: Extended version of identically-titled paper to be published in the proceedings of CAV 2025
Radar Network for Gait Monitoring: Technology and Validation
In recent years, radar-based devices have emerged as an alternative approach for gait monitoring. However, the radar configuration and the algorithms used to extract the gait parameters often differ between contributions, lacking a systematic evaluation of the most appropriate setup. Additionally, radar-based studies often exclude motorically impaired subjects, leaving it unclear whether the existing algorithms are applicable to such populations. In this paper, a radar network is developed and validated by monitoring the gait of five healthy individuals and three patients with Parkinson's disease. Six configurations and four algorithms were compared using Vicon as ground-truth to determine the most appropriate solution for gait monitoring. The best results were obtained using only three nodes: two oriented towards the feet and one towards the torso. The most accurate stride velocity and distance in the state of the art were obtained with this configuration. Moreover, we show that analyzing the feet velocity increases the reliability of the temporal parameters, especially with aged or motorically impaired subjects. The contribution is significant for the implementation of radar networks in clinical and domestic environments, as it addresses critical aspects concerning the radar network configuration and algorithms.
Designing RF-Powered Battery-Less Electronic Shelf Labels With COTS Components
This paper presents a preliminary study exploring the feasibility of designing batteryless electronic shelf labels (ESLs) powered by radio frequency wireless power transfer using commercial off-the-shelf components. The proposed ESL design is validated through a dedicated testbed and involves a detailed analysis of design choices, including energy consumption, energy conversion, and storage solutions. A leaded aluminium electrolytic capacitor is selected as the primary energy storage element, balancing cost and performance while maintaining compactness. Experimental evaluations demonstrate that an ESL can update its display within 4 to 120 minutes, depending on input power and RF frequency, with harvester efficiencies reaching up to 30 %. Challenges such as low harvester efficiency, extended update times, and hardware constraints are identified, highlighting opportunities for future optimizations. This work provides valuable insights into system design considerations for RF-powered ESLs and establishes a foundation for further research in energy-neutral Internet of Things applications.
comment: 12 pages, 5 figures and accepted for presentation at the IEEE Wireless Power Technology Conference and Expo
Enhanced Prediction Model for Time Series Characterized by GARCH via Interval Type-2 Fuzzy Inference System
GARCH-type time series (characterized by Generalized Autoregressive Conditional Heteroskedasticity) exhibit pronounced volatility, autocorrelation, and heteroskedasticity. To address these challenges and enhance predictive accuracy, this study introduces a hybrid forecasting framework that integrates the Interval Type-2 Fuzzy Inference System (IT2FIS) with the GARCH model. Leveraging the interval-based uncertainty representation of IT2FIS and the volatility-capturing capability of GARCH, the proposed model effectively mitigates the adverse impact of heteroskedasticity on prediction reliability. Specifically, the GARCH component estimates conditional variance, which is subsequently incorporated into the Gaussian membership functions of IT2FIS. This integration transforms IT2FIS into an adaptive variable-parameter system, dynamically aligning with the time-varying volatility of the target series. Through systematic parameter optimization, the framework not only captures intricate volatility patterns but also accounts for heteroskedasticity and epistemic uncertainties during modeling, thereby improving both prediction precision and model robustness. Experimental validation employs diverse datasets, including air quality concentration, urban traffic flow, and energy consumption. Comparative analyses are conducted against models: the GARCH-Takagi-Sugeno-Kang (GARCH-TSK) model, fixed-variance time series models, the GARCH-Gated Recurrent Unit (GARCH-GRU), and Long Short-Term Memory (LSTM) networks. The results indicate that the proposed model achieves superior predictive performance across the majority of test scenarios in error metrics. These findings underscore the effectiveness of hybrid approaches in forecasting uncertainty for GARCH-type time series, highlighting their practical utility in real-world time series forecasting applications.
comment: 40 pages, 13 figures, references added
Robustness to Model Approximation, Model Learning From Data, and Sample Complexity in Wasserstein Regular MDPs
The paper studies the robustness properties of discrete-time stochastic optimal control under Wasserstein model approximation for both discounted cost and average cost criteria. Specifically, we study the performance loss when applying an optimal policy designed for an approximate model to the true dynamics compared with the optimal cost for the true model under the sup-norm-induced metric, and relate it to the Wasserstein-1 distance between the approximate and true transition kernels. A primary motivation of this analysis is empirical model learning, as well as empirical noise distribution learning, where Wasserstein convergence holds under mild conditions but stronger convergence criteria, such as total variation, may not. We discuss applications of the results to the disturbance estimation problem, where sample complexity bounds are given, and also to a general empirical model learning approach, obtained under either Markov or i.i.d. learning settings.
comment: 36 pages
Policy Design for Two-sided Platforms with Participation Dynamics ICML2025
In two-sided platforms (e.g., video streaming or e-commerce), viewers and providers engage in interactive dynamics: viewers benefit from increases in provider populations, while providers benefit from increases in viewer population. Despite the importance of such "population effects" on long-term platform health, recommendation policies do not generally take the participation dynamics into account. This paper thus studies the dynamics and recommender policy design on two-sided platforms under the population effects for the first time. Our control- and game-theoretic findings warn against the use of the standard "myopic-greedy" policy and shed light on the importance of provider-side considerations (i.e., effectively distributing exposure among provider groups) to improve social welfare via population growth. We also present a simple algorithm to optimize long-term social welfare by taking the population effects into account, and demonstrate its effectiveness in synthetic and real-data experiments. Our experiment code is available at https://github.com/sdean-group/dynamics-two-sided-market.
comment: ICML2025
A Concentration Bound for TD(0) with Function Approximation
We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.
comment: Submitted to Stochastic Systems
Improving Generative Inverse Design of Rectangular Patch Antennas with Test Time Optimization
We propose a two-stage deep learning framework for the inverse design of rectangular patch antennas. Our approach leverages generative modeling to learn a latent representation of antenna frequency response curves and conditions a subsequent generative model on these responses to produce feasible antenna geometries. We further demonstrate that leveraging search and optimization techniques at test-time improves the accuracy of the generated designs and enables consideration of auxiliary objectives such as manufacturability. Our approach generalizes naturally to different design criteria, and can be easily adapted to more complex geometric design spaces.
comment: Code and dataset available at https://github.com/becklabs/patch-antenna-tto
Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation
This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of observer error, it is difficult to analyze the convergence and stability of most existing output feedback learning-based control methods. To tackle these issues, we propose a generalized dynamic output feedback learning control approach with guaranteed convergence, stability, and optimality performance for solving the LQR problem of unknown discrete-time linear systems. In particular, a dynamic output feedback controller is designed to be equivalent to a state feedback controller. This equivalence relationship is an inherent property without requiring convergence of the estimated state by the state observer, which plays a key role in establishing the off-policy learning control approaches. By value iteration and policy iteration schemes, the adaptive dynamic programming based learning control approaches are developed to estimate the optimal feedback control gain. In addition, a model-free stability criterion is provided by finding a nonsingular parameterization matrix, which contributes to establishing a switched iteration scheme. Furthermore, the convergence, stability, and optimality analyses of the proposed output feedback learning control approaches are given. Finally, the theoretical results are validated by two numerical examples.
comment: 16 pages, 5 figures
Learning Enhanced Ensemble Filters
The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. This shortcoming is addressed by approximating the mean-field evolution using a novel form of neural operator taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. The derivation of methods from a mean-field formulation allows a single parameterization of the algorithm to be deployed at different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root mean-square-error performance relative to leading methods in filtering the Lorenz 96 and Kuramoto-Sivashinsky models.
comment: Preprint submitted to Journal of Computational Physics
Outlier-Robust Linear System Identification Under Heavy-tailed Noise
We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., "light-tailed". In sharp contrast, we work under a significantly weaker noise model, assuming nothing more than the existence of the fourth moment of the noise distribution. For this setting, we provide the first set of results demonstrating that one can obtain sample-complexity bounds for linear system identification that are nearly of the same order as under sub-Gaussian noise. To achieve such results, we develop a novel robust system identification algorithm that relies on constructing multiple weakly-concentrated estimators, and then boosting their performance using suitable tools from high-dimensional robust statistics. Interestingly, our analysis reveals how the kurtosis of the noise distribution, a measure of heavy-tailedness, affects the number of trajectories needed to achieve desired estimation error bounds. Finally, we show that our algorithm and analysis technique can be easily extended to account for scenarios where an adversary can arbitrarily corrupt a small fraction of the collected trajectory data. Our work takes the first steps towards building a robust statistical learning theory for control under non-ideal assumptions on the data-generating process.
Polynomial Chaos Expanded Gaussian Process
In complex and unknown processes, global models are initially generated over the entire experimental space but often fail to provide accurate predictions in local areas. A common approach is to use local models, which requires partitioning the experimental space and training multiple models, adding significant complexity. Recognizing this limitation, this study addresses the need for models that effectively represent both global and local experimental spaces. It introduces a novel machine learning (ML) approach: Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion (PCE) to calculate input-dependent hyperparameters of the Gaussian process (GP). This provides a mathematically interpretable approach that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP, highlighting model performance that is often competitive with or better than previous methods. A key advantage of the presented model is its interpretable hyperparameters along with training and prediction runtimes comparable to those of a GP.
comment: Manuscript: 38 pages, 12 figures, 11 tables
Chance-Constrained Sampling-Based MPC for Collision Avoidance in Uncertain Dynamic Environments
Navigating safely in dynamic and uncertain environments is challenging due to uncertainties in perception and motion. This letter presents the Chance-Constrained Unscented Model Predictive Path Integral (C2U-MPPI) framework, a robust sampling-based Model Predictive Control (MPC) algorithm that addresses these challenges by leveraging the U-MPPI control strategy with integrated probabilistic chance constraints, enabling more reliable and efficient navigation under uncertainty. Unlike gradient-based MPC methods, our approach (i) avoids linearization of system dynamics by directly applying non-convex and nonlinear chance constraints, enabling more accurate and flexible optimization, and (ii) enhances computational efficiency by leveraging a deterministic form of probabilistic constraints and employing a layered dynamic obstacle representation, enabling real-time handling of multiple obstacles. Extensive experiments in simulated and real-world human-shared environments validate the effectiveness of our algorithm against baseline methods, showcasing its capability to generate feasible trajectories and control inputs that adhere to system dynamics and constraints in dynamic settings, enabled by unscented-based sampling strategy and risk-sensitive trajectory evaluation. A supplementary video is available at: https://youtu.be/FptAhvJlQm8.
comment: This paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L), May 2025. It comprises 9 pages, 3 figures, and 7 tables
Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks
We provide a communication- and computation-efficient method for distributed submodular optimization in robot mesh networks. Submodularity is a property of diminishing returns that arises in active information gathering such as mapping, surveillance, and target tracking. Our method, Resource-Aware distributed Greedy (RAG), introduces a new distributed optimization paradigm that enables scalable and near-optimal action coordination. To this end, RAG requires each robot to make decisions based only on information received from and about their neighbors. In contrast, the current paradigms allow the relay of information about all robots across the network. As a result, RAG's decision-time scales linearly with the network size, while state-of-the-art near-optimal submodular optimization algorithms scale cubically. We also characterize how the designed mesh-network topology affects RAG's approximation performance. Our analysis implies that sparser networks favor scalability without proportionally compromising approximation performance: while RAG's decision time scales linearly with network size, the gain in approximation performance scales sublinearly. We demonstrate RAG's performance in simulated scenarios of area detection with up to 45 robots, simulating realistic robot-to-robot (r2r) communication speeds such as the 0.25 Mbps speed of the Digi XBee 3 Zigbee 3.0. In the simulations, RAG enables real-time planning, up to three orders of magnitude faster than competitive near-optimal algorithms, while also achieving superior mean coverage performance. To enable the simulations, we extend the high-fidelity and photo-realistic simulator AirSim by integrating a scalable collaborative autonomy pipeline to tens of robots and simulating r2r communication delays. Our code is available at https://github.com/UM-iRaL/Resource-Aware-Coordination-AirSim.
comment: Accepted to IEEE Transactions on Robotics
Moving Obstacle Collision Avoidance via Chance-Constrained MPC with CBF
Model predictive control (MPC) with control barrier functions (CBF) is a promising solution to address the moving obstacle collision avoidance (MOCA) problem. Unlike MPC with distance constraints (MPC-DC), this approach facilitates early obstacle avoidance without the need to increase prediction horizons. However, the existing MPC-CBF method is deterministic and fails to account for perception uncertainties. This paper proposes a generalized MPC-CBF approach for stochastic scenarios, which maintains the advantages of the deterministic method for addressing the MOCA problem. Specifically, the chance-constrained MPC-CBF (CC-MPC-CBF) technique is introduced to ensure that a user-defined collision avoidance probability is met by utilizing probabilistic CBFs. However, due to the potential empty intersection between the reachable set and the safe region confined by CBF constraints, the CC-MPC-CBF problem can pose challenges in achieving feasibility. To address this issue, we propose a sequential implementation approach that involves solving a standard MPC optimization problem followed by a predictive safety filter optimization, which leads to improved feasibility. Furthermore, we introduce an iterative convex optimization scheme to further expedite the resolution of the predictive safety filter, which results in an efficient approach to tackling the non-convex CC-MPC-CBF problem. We apply our proposed algorithm to a double integrator system for MOCA, and we showcase its resilience to obstacle measurement uncertainties and favorable feasibility properties.
Robotics
Target Tracking via LiDAR-RADAR Sensor Fusion for Autonomous Racing
High Speed multi-vehicle Autonomous Racing will increase the safety and performance of road-going Autonomous Vehicles. Precise vehicle detection and dynamics estimation from a moving platform is a key requirement for planning and executing complex autonomous overtaking maneuvers. To address this requirement, we have developed a Latency-Aware EKF-based Multi Target Tracking algorithm fusing LiDAR and RADAR measurements. The algorithm explots the different sensor characteristics by explicitly integrating the Range Rate in the EKF Measurement Function, as well as a-priori knowledge of the racetrack during state prediction. It can handle Out-Of-Sequence Measurements via Reprocessing using a double State and Measurement Buffer, ensuring sensor delay compensation with no information loss. This algorithm has been implemented on Team PoliMOVE's autonomous racecar, and was proved experimentally by completing a number of fully autonomous overtaking maneuvers at speeds up to 275 km/h.
comment: IEEE Conference, 6 pages
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers
Tactile sensing provides local essential information that is complementary to visual perception, such as texture, compliance, and force. Despite recent advances in visuotactile representation learning, challenges remain in fusing these modalities and generalizing across tasks and environments without heavy reliance on pre-trained vision-language models. Moreover, existing methods do not study positional encodings, thereby overlooking the multi-scale spatial reasoning needed to capture fine-grained visuotactile correlations. We introduce ViTaPEs, a transformer-based framework that robustly integrates visual and tactile input data to learn task-agnostic representations for visuotactile perception. Our approach exploits a novel multi-scale positional encoding scheme to capture intra-modal structures, while simultaneously modeling cross-modal cues. Unlike prior work, we provide provable guarantees in visuotactile fusion, showing that our encodings are injective, rigid-motion-equivariant, and information-preserving, validating these properties empirically. Experiments on multiple large-scale real-world datasets show that ViTaPEs not only surpasses state-of-the-art baselines across various recognition tasks but also demonstrates zero-shot generalization to unseen, out-of-domain scenarios. We further demonstrate the transfer-learning strength of ViTaPEs in a robotic grasping task, where it outperforms state-of-the-art baselines in predicting grasp success. Project page: https://sites.google.com/view/vitapes
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases. Code and dataset will be found in https://github.com/Liuxueyi/ReasonPlan.
comment: 18 pages; 9 figures; https://github.com/Liuxueyi/ReasonPlan
A Cooperative Aerial System of A Payload Drone Equipped with Dexterous Rappelling End Droid for Cluttered Space Pickup
In cluttered spaces, such as forests, drone picking up a payload via an abseil claw is an open challenge, as the cable is likely tangled and blocked by the branches and obstacles. To address such a challenge, in this work, a cooperative aerial system is proposed, which consists of a payload drone and a dexterous rappelling end droid. The two ends are linked via a Kevlar tether cable. The end droid is actuated by four propellers, which enable mid-air dexterous adjustment of clawing angle and guidance of cable movement. To avoid tanglement and rappelling obstacles, a trajectory optimization method that integrates cable length constraints and dynamic feasibility is developed, which guarantees safe pickup. A tether cable dynamic model is established to evaluate real-time cable status, considering both taut and sagging conditions. Simulation and real-world experiments are conducted to demonstrate that the proposed system is capable of picking up payload in cluttered spaces. As a result, the end droid can reach the target point successfully under cable constraints and achieve passive retrieval during the lifting phase without propulsion, which enables effective and efficient aerial manipulation.
comment: Video: https://youtu.be/dKrmzPdnblY
Uncertainty-Aware Safety-Critical Decision and Control for Autonomous Vehicles at Unsignalized Intersections
Reinforcement learning (RL) has demonstrated potential in autonomous driving (AD) decision tasks. However, applying RL to urban AD, particularly in intersection scenarios, still faces significant challenges. The lack of safety constraints makes RL vulnerable to risks. Additionally, cognitive limitations and environmental randomness can lead to unreliable decisions in safety-critical scenarios. Therefore, it is essential to quantify confidence in RL decisions to improve safety. This paper proposes an Uncertainty-aware Safety-Critical Decision and Control (USDC) framework, which generates a risk-averse policy by constructing a risk-aware ensemble distributional RL, while estimating uncertainty to quantify the policy's reliability. Subsequently, a high-order control barrier function (HOCBF) is employed as a safety filter to minimize intervention policy while dynamically enhancing constraints based on uncertainty. The ensemble critics evaluate both HOCBF and RL policies, embedding uncertainty to achieve dynamic switching between safe and flexible strategies, thereby balancing safety and efficiency. Simulation tests on unsignalized intersections in multiple tasks indicate that USDC can improve safety while maintaining traffic efficiency compared to baselines.
comment: 8 pages, 6 figures
Causal Bayesian Networks for Data-driven Safety Analysis of Complex Systems
Ensuring safe operation of safety-critical complex systems interacting with their environment poses significant challenges, particularly when the system's world model relies on machine learning algorithms to process the perception input. A comprehensive safety argumentation requires knowledge of how faults or functional insufficiencies propagate through the system and interact with external factors, to manage their safety impact. While statistical analysis approaches can support the safety assessment, associative reasoning alone is neither sufficient for the safety argumentation nor for the identification and investigation of safety measures. A causal understanding of the system and its interaction with the environment is crucial for safeguarding safety-critical complex systems. It allows to transfer and generalize knowledge, such as insights gained from testing, and facilitates the identification of potential improvements. This work explores using causal Bayesian networks to model the system's causalities for safety analysis, and proposes measures to assess causal influences based on Pearl's framework of causal inference. We compare the approach of causal Bayesian networks to the well-established fault tree analysis, outlining advantages and limitations. In particular, we examine importance metrics typically employed in fault tree analysis as foundation to discuss suitable causal metrics. An evaluation is performed on the example of a perception system for automated driving. Overall, this work presents an approach for causal reasoning in safety analysis that enables the integration of data-driven and expert-based knowledge to account for uncertainties arising from complex systems operating in open environments.
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise - requiring efficient exploration coupled with long-horizon credit assignment - and overcoming these challenges is key for building self-improving agents with superhuman ability. We argue that solving complex and high-dimensional tasks requires solving simpler tasks that are relevant to the target task. In contrast, most prior work designs strategies for selecting exploratory tasks with the objective of solving any task, making exploration of challenging high-dimensional, long-horizon tasks intractable. We find that the sense of direction, necessary for effective exploration, can be extracted from existing RL algorithms, without needing any prior information. Based on this finding, we propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task. We connect DISCOVER to principled exploration in bandits, formally bounding the time until the target task becomes achievable in terms of the agent's initial distance to the target, but independent of the volume of the space of all tasks. Empirically, we perform a thorough evaluation in high-dimensional environments. We find that the directed goal selection of DISCOVER solves exploration problems that are beyond the reach of prior state-of-the-art exploration methods in RL.
Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees
In many real-world applications of regression, conditional probability estimation, and uncertainty quantification, exploiting symmetries rooted in physics or geometry can dramatically improve generalization and sample efficiency. While geometric deep learning has made significant empirical advances by incorporating group-theoretic structure, less attention has been given to statistical learning guarantees. In this paper, we introduce an equivariant representation learning framework that simultaneously addresses regression, conditional probability estimation, and uncertainty quantification while providing first-of-its-kind non-asymptotic statistical learning guarantees. Grounded in operator and group representation theory, our framework approximates the spectral decomposition of the conditional expectation operator, building representations that are both equivariant and disentangled along independent symmetry subgroups. Empirical evaluations on synthetic datasets and real-world robotics applications confirm the potential of our approach, matching or outperforming existing equivariant baselines in regression while additionally providing well-calibrated parametric uncertainty estimates.
Integrating emotional intelligence, memory architecture, and gestures to achieve empathetic humanoid robot interaction in an educational setting
This study investigates the integration of individual human traits into an empathetically adaptive educational robot tutor system designed to improve student engagement and learning outcomes with corresponding Engagement Vector measurement. While prior research in the field of Human-Robot Interaction (HRI) has examined the integration of the traits, such as emotional intelligence, memory-driven personalization, and non-verbal communication, by themselves, they have thus-far neglected to consider their synchronized integration into a cohesive, operational education framework. To address this gap, we customize a Multi-Modal Large Language Model (LLaMa 3.2 from Meta) deployed with modules for human-like traits (emotion, memory and gestures) into an AI-Agent framework. This constitutes to the robot's intelligent core mimicing the human emotional system, memory architecture and gesture control to allow the robot to behave more empathetically while recognizing and responding appropriately to the student's emotional state. It can also recall the student's past learning record and adapt its style of interaction accordingly. This allows the robot tutor to react to the student in a more sympathetic manner by delivering personalized verbal feedback synchronized with relevant gestures. Our study investigates the extent of this effect through the introduction of Engagement Vector Model which can be a surveyor's pole for judging the quality of HRI experience. Quantitative and qualitative results demonstrate that such an empathetic responsive approach significantly improves student engagement and learning outcomes compared with a baseline humanoid robot without these human-like traits. This indicates that robot tutors with empathetic capabilities can create a more supportive, interactive learning experience that ultimately leads to better outcomes for the student.
TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning
Developing scalable and generalizable reward engineering for reinforcement learning (RL) is crucial for creating general-purpose agents, especially in the challenging domain of robotic manipulation. While recent advances in reward engineering with Vision-Language Models (VLMs) have shown promise, their sparse reward nature significantly limits sample efficiency. This paper introduces TeViR, a novel method that leverages a pre-trained text-to-video diffusion model to generate dense rewards by comparing the predicted image sequence with current observations. Experimental results across 11 complex robotic tasks demonstrate that TeViR outperforms traditional methods leveraging sparse rewards and other state-of-the-art (SOTA) methods, achieving better sample efficiency and performance without ground truth environmental rewards. TeViR's ability to efficiently guide agents in complex environments highlights its potential to advance reinforcement learning applications in robotic manipulation.
RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback
Vision-Language-Action (VLA) models have demonstrated significant potential in the field of embodied intelligence, enabling agents to follow human instructions to complete complex tasks in physical environments. Existing embodied agents are often trained through behavior cloning, which requires expensive data and computational resources and is constrained by human demonstrations. To address this issue, many researchers explore the application of reinforcement fine-tuning to embodied agents. However, typical reinforcement fine-tuning methods for embodied agents usually rely on sparse, outcome-based rewards, which struggle to provide fine-grained feedback for specific actions within an episode, thus limiting the model's manipulation capabilities and generalization performance. In this paper, we propose RFTF, a novel reinforcement fine-tuning method that leverages a value model to generate dense rewards in embodied scenarios. Specifically, our value model is trained using temporal information, eliminating the need for costly robot action labels. In addition, RFTF incorporates a range of techniques, such as GAE and sample balance to enhance the effectiveness of the fine-tuning process. By addressing the sparse reward problem in reinforcement fine-tuning, our method significantly improves the performance of embodied agents, delivering superior generalization and adaptation capabilities across diverse embodied tasks. Experimental results show that embodied agents fine-tuned with RFTF achieve new state-of-the-art performance on the challenging CALVIN ABC-D with an average success length of 4.296. Moreover, RFTF enables rapid adaptation to new environments. After fine-tuning in the D environment of CALVIN for a few episodes, RFTF achieved an average success length of 4.301 in this new environment.
Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning
Imitation learning is a promising approach for enabling generalist capabilities in humanoid robots, but its scaling is fundamentally constrained by the scarcity of high-quality expert demonstrations. This limitation can be mitigated by leveraging suboptimal, open-ended play data, often easier to collect and offering greater diversity. This work builds upon recent advances in generative modeling, specifically Flow Matching, an alternative to Diffusion models. We introduce a method for estimating the extremum of the learned distribution by leveraging the unique properties of Flow Matching, namely, deterministic transport and support for arbitrary source distributions. We apply this method to develop several goal-conditioned imitation and reinforcement learning algorithms based on Flow Matching, where policies are conditioned on both current and goal observations. We explore and compare different architectural configurations by combining core components, such as critic, planner, actor, or world model, in various ways. We evaluated our agents on the OGBench benchmark and analyzed how different demonstration behaviors during data collection affect performance in a 2D non-prehensile pushing task. Furthermore, we validated our approach on real hardware by deploying it on the Talos humanoid robot to perform complex manipulation tasks based on high-dimensional image observations, featuring a sequence of pick-and-place and articulated object manipulation in a realistic kitchen environment. Experimental videos and code are available at: https://hucebot.github.io/extremum_flow_matching_website/
JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning
Recent advances in model-based reinforcement learning (MBRL) have achieved super-human level performance on the Atari100k benchmark, driven by reinforcement learning agents trained on powerful diffusion world models. However, we identify that the current aggregates mask a major performance asymmetry: MBRL agents dramatically outperform humans in some tasks despite drastically underperforming in others, with the former inflating the aggregate metrics. This is especially pronounced in pixel-based agents trained with diffusion world models. In this work, we address the pronounced asymmetry observed in pixel-based agents as an initial attempt to reverse the worrying upward trend observed in them. We address the problematic aggregates by delineating all tasks as Agent-Optimal or Human-Optimal and advocate for equal importance on metrics from both sets. Next, we hypothesize this pronounced asymmetry is due to the lack of temporally-structured latent space trained with the World Model objective in pixel-based methods. Lastly, to address this issue, we propose Joint Embedding DIffusion (JEDI), a novel latent diffusion world model trained end-to-end with the self-consistency objective. JEDI outperforms SOTA models in human-optimal tasks while staying competitive across the Atari100k benchmark, and runs 3 times faster with 43% lower memory than the latest pixel-based diffusion baseline. Overall, our work rethinks what it truly means to cross human-level performance in Atari100k.
comment: Preprint
GeoPF: Infusing Geometry into Potential Fields for Reactive Planning in Non-trivial Environments
Reactive intelligence remains one of the cornerstones of versatile robotics operating in cluttered, dynamic, and human-centred environments. Among reactive approaches, potential fields (PF) continue to be widely adopted due to their simplicity and real-time applicability. However, existing PF methods typically oversimplify environmental representations by relying on isotropic, point- or sphere-based obstacle approximations. In human-centred settings, this simplification results in overly conservative paths, cumbersome tuning, and computational overhead -- even breaking real-time requirements. In response, we propose the Geometric Potential Field (GeoPF), a reactive motion-planning framework that explicitly infuses geometric primitives - points, lines, planes, cubes, and cylinders - into real-time planning. By leveraging precise closed-form distance functions, GeoPF significantly reduces computational complexity and parameter tuning effort. Extensive quantitative analyses consistently show GeoPF's higher success rates, reduced tuning complexity (a single parameter set across experiments), and substantially lower computational costs (up to 2 orders of magnitude) compared to traditional PF methods. Real-world experiments further validate GeoPF's robustness and practical ease of deployment. GeoPF provides a fresh perspective on reactive planning problems driving geometric-aware temporal motion generation, enabling flexible and low-latency motion planning suitable for modern robotic applications.
Autonomous Flights inside Narrow Tunnels
Multirotors are usually desired to enter confined narrow tunnels that are barely accessible to humans in various applications including inspection, search and rescue, and so on. This task is extremely challenging since the lack of geometric features and illuminations, together with the limited field of view, cause problems in perception; the restricted space and significant ego airflow disturbances induce control issues. This paper introduces an autonomous aerial system designed for navigation through tunnels as narrow as 0.5 m in diameter. The real-time and online system includes a virtual omni-directional perception module tailored for the mission and a novel motion planner that incorporates perception and ego airflow disturbance factors modeled using camera projections and computational fluid dynamics analyses, respectively. Extensive flight experiments on a custom-designed quadrotor are conducted in multiple realistic narrow tunnels to validate the superior performance of the system, even over human pilots, proving its potential for real applications. Additionally, a deployment pipeline on other multirotor platforms is outlined and open-source packages are provided for future developments.
comment: 21 pages, 27 figures, accepted by IEEE Transactions on Robotics (T-RO)
Software Engineering for Self-Adaptive Robotics: A Research Agenda
Self-adaptive robotic systems are designed to operate autonomously in dynamic and uncertain environments, requiring robust mechanisms to monitor, analyse, and adapt their behaviour in real-time. Unlike traditional robotic software, which follows predefined logic, self-adaptive robots leverage artificial intelligence, machine learning, and model-driven engineering to continuously adjust to changing operational conditions while ensuring reliability, safety, and performance. This paper presents a research agenda for software engineering in self-adaptive robotics, addressing critical challenges across two key dimensions: (1) the development phase, including requirements engineering, software design, co-simulation, and testing methodologies tailored to adaptive robotic systems, and (2) key enabling technologies, such as digital twins, model-driven engineering, and AI-driven adaptation, which facilitate runtime monitoring, fault detection, and automated decision-making. We discuss open research challenges, including verifying adaptive behaviours under uncertainty, balancing trade-offs between adaptability, performance, and safety, and integrating self-adaptation frameworks like MAPE-K. By providing a structured roadmap, this work aims to advance the software engineering foundations for self-adaptive robotic systems, ensuring they remain trustworthy, efficient, and capable of handling real-world complexities.
Indoor Air Quality Detection Robot Model Based on the Internet of Things (IoT)
This paper presents the design, implementation, and evaluation of an IoT-based robotic system for mapping and monitoring indoor air quality. The primary objective was to develop a mobile robot capable of autonomously mapping a closed environment, detecting concentrations of CO$_2$, volatile organic compounds (VOCs), smoke, temperature, and humidity, and transmitting real-time data to a web interface. The system integrates a set of sensors (SGP30, MQ-2, DHT11, VL53L0X, MPU6050) with an ESP32 microcontroller. It employs a mapping algorithm for spatial data acquisition and utilizes a Mamdani fuzzy logic system for air quality classification. Empirical tests in a model room demonstrated average localization errors below $5\%$, actuator motion errors under $2\%$, and sensor measurement errors within $12\%$ across all modalities. The contributions of this work include: (1) a low-cost, integrated IoT robotic platform for simultaneous mapping and air quality detection; (2) a web-based user interface for real-time visualization and control; and (3) validation of system accuracy under laboratory conditions.
Whole-body Multi-contact Motion Control for Humanoid Robots Based on Distributed Tactile Sensors
To enable humanoid robots to work robustly in confined environments, multi-contact motion that makes contacts not only at extremities, such as hands and feet, but also at intermediate areas of the limbs, such as knees and elbows, is essential. We develop a method to realize such whole-body multi-contact motion involving contacts at intermediate areas by a humanoid robot. Deformable sheet-shaped distributed tactile sensors are mounted on the surface of the robot's limbs to measure the contact force without significantly changing the robot body shape. The multi-contact motion controller developed earlier, which is dedicated to contact at extremities, is extended to handle contact at intermediate areas, and the robot motion is stabilized by feedback control using not only force/torque sensors but also distributed tactile sensors. Through verification on dynamics simulations, we show that the developed tactile feedback improves the stability of whole-body multi-contact motion against disturbances and environmental errors. Furthermore, the life-sized humanoid RHP Kaleido demonstrates whole-body multi-contact motions, such as stepping forward while supporting the body with forearm contact and balancing in a sitting posture with thigh contacts.
Situationally-Aware Dynamics Learning
Autonomous robots operating in complex, unstructured environments face significant challenges due to latent, unobserved factors that obscure their understanding of both their internal state and the external world. Addressing this challenge would enable robots to develop a more profound grasp of their operational context. To tackle this, we propose a novel framework for online learning of hidden state representations, with which the robots can adapt in real-time to uncertain and dynamic conditions that would otherwise be ambiguous and result in suboptimal or erroneous behaviors. Our approach is formalized as a Generalized Hidden Parameter Markov Decision Process, which explicitly models the influence of unobserved parameters on both transition dynamics and reward structures. Our core innovation lies in learning online the joint distribution of state transitions, which serves as an expressive representation of latent ego- and environmental-factors. This probabilistic approach supports the identification and adaptation to different operational situations, improving robustness and safety. Through a multivariate extension of Bayesian Online Changepoint Detection, our method segments changes in the underlying data generating process governing the robot's dynamics. The robot's transition model is then informed with a symbolic representation of the current situation derived from the joint distribution of latest state transitions, enabling adaptive and context-aware decision-making. To showcase the real-world effectiveness, we validate our approach in the challenging task of unstructured terrain navigation, where unmodeled and unmeasured terrain characteristics can significantly impact the robot's motion. Extensive experiments in both simulation and real world reveal significant improvements in data efficiency, policy performance, and the emergence of safer, adaptive navigation strategies.
LF-GNSS: Towards More Robust Satellite Positioning with a Hard Example Mining Enhanced Learning-Filtering Deep Fusion Framework
Global Navigation Satellite System (GNSS) is essential for autonomous driving systems, unmanned vehicles, and various location-based technologies, as it provides the precise geospatial information necessary for navigation and situational awareness. However, its performance is often degraded by Non-Line-Of-Sight (NLOS) and multipath effects, especially in urban environments. Recently, Artificial Intelligence (AI) has been driving innovation across numerous industries, introducing novel solutions to mitigate the challenges in satellite positioning. This paper presents a learning-filtering deep fusion framework for satellite positioning, termed LF-GNSS. The framework utilizes deep learning networks to intelligently analyze the signal characteristics of satellite observations, enabling the adaptive construction of observation noise covariance matrices and compensated innovation vectors for Kalman filter input. A dynamic hard example mining technique is incorporated to enhance model robustness by prioritizing challenging satellite signals during training. Additionally, we introduce a novel feature representation based on Dilution of Precision (DOP) contributions, which helps to more effectively characterize the signal quality of individual satellites and improve measurement weighting. LF-GNSS has been validated on both public and private datasets, demonstrating superior positioning accuracy compared to traditional methods and other learning-based solutions. To encourage further integration of AI and GNSS research, we will open-source the code at https://github.com/GarlanLou/LF-GNSS, and release a collection of satellite positioning datasets for urban scenarios at https://github.com/GarlanLou/LF-GNSS-Dataset.
Real-time Whole-body Model Predictive Control for Bipedal Locomotion with a Novel Kino-dynamic Model and Warm-start Method
Advancements in optimization solvers and computing power have led to growing interest in applying whole-body model predictive control (WB-MPC) to bipedal robots. However, the high degrees of freedom and inherent model complexity of bipedal robots pose significant challenges in achieving fast and stable control cycles for real-time performance. This paper introduces a novel kino-dynamic model and warm-start strategy for real-time WB-MPC in bipedal robots. Our proposed kino-dynamic model combines the linear inverted pendulum plus flywheel and full-body kinematics model. Unlike the conventional whole-body model that rely on the concept of contact wrenches, our model utilizes the zero-moment point (ZMP), reducing baseline computational costs and ensuring consistently low latency during contact state transitions. Additionally, a modularized multi-layer perceptron (MLP) based warm-start strategy is proposed, leveraging a lightweight neural network to provide a good initial guess for each control cycle. Furthermore, we present a ZMP-based whole-body controller (WBC) that extends the existing WBC for explicitly controlling impulses and ZMP, integrating it into the real-time WB-MPC framework. Through various comparative experiments, the proposed kino-dynamic model and warm-start strategy have been shown to outperform previous studies. Simulations and real robot experiments further validate that the proposed framework demonstrates robustness to perturbation and satisfies real-time control requirements during walking.
comment: This work is currently under revision for possible publication in the IEEE
Heavy lifting tasks via haptic teleoperation of a wheeled humanoid
Humanoid robots can support human workers in physically demanding environments by performing tasks that require whole-body coordination, such as lifting and transporting heavy objects.These tasks, which we refer to as Dynamic Mobile Manipulation (DMM), require the simultaneous control of locomotion, manipulation, and posture under dynamic interaction forces. This paper presents a teleoperation framework for DMM on a height-adjustable wheeled humanoid robot for carrying heavy payloads. A Human-Machine Interface (HMI) enables whole-body motion retargeting from the human pilot to the robot by capturing the motion of the human and applying haptic feedback. The pilot uses body motion to regulate robot posture and locomotion, while arm movements guide manipulation.Real time haptic feedback delivers end effector wrenches and balance related cues, closing the loop between human perception and robot environment interaction. We evaluate the different telelocomotion mappings that offer varying levels of balance assistance, allowing the pilot to either manually or automatically regulate the robot's lean in response to payload-induced disturbances. The system is validated in experiments involving dynamic lifting of barbells and boxes up to 2.5 kg (21% of robot mass), demonstrating coordinated whole-body control, height variation, and disturbance handling under pilot guidance. Video demo can be found at: https://youtu.be/jF270_bG1h8?feature=shared
Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures ICML 2025
Learning unknown dynamics under environmental (or external) constraints is fundamental to many fields (e.g., modern robotics), particularly challenging when constraint information is only locally available and uncertain. Existing approaches requiring global constraints or using probabilistic filtering fail to fully exploit the geometric structure inherent in local measurements (by using, e.g., sensors) and constraints. This paper presents a geometric framework unifying measurements, constraints, and dynamics learning through a fiber bundle structure over the state space. This naturally induced geometric structure enables measurement-aware Control Barrier Functions that adapt to local sensing (or measurement) conditions. By integrating Neural ODEs, our framework learns continuous-time dynamics while preserving geometric constraints, with theoretical guarantees of learning convergence and constraint satisfaction dependent on sensing quality. The geometric framework not only enables efficient dynamics learning but also suggests promising directions for integration with reinforcement learning approaches. Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions.
comment: Accepted by ICML 2025
DiffE2E: Rethinking End-to-End Driving with a Hybrid Action Diffusion and Supervised Policy
End-to-end learning has emerged as a transformative paradigm in autonomous driving. However, the inherently multimodal nature of driving behaviors and the generalization challenges in long-tail scenarios remain critical obstacles to robust deployment. We propose DiffE2E, a diffusion-based end-to-end autonomous driving framework. This framework first performs multi-scale alignment of multi-sensor perception features through a hierarchical bidirectional cross-attention mechanism. It then introduces a novel class of hybrid diffusion-supervision decoders based on the Transformer architecture, and adopts a collaborative training paradigm that seamlessly integrates the strengths of both diffusion and supervised policy. DiffE2E models structured latent spaces, where diffusion captures the distribution of future trajectories and supervision enhances controllability and robustness. A global condition integration module enables deep fusion of perception features with high-level targets, significantly improving the quality of trajectory generation. Subsequently, a cross-attention mechanism facilitates efficient interaction between integrated features and hybrid latent variables, promoting the joint optimization of diffusion and supervision objectives for structured output generation, ultimately leading to more robust control. Experiments demonstrate that DiffE2E achieves state-of-the-art performance in both CARLA closed-loop evaluations and NAVSIM benchmarks. The proposed integrated diffusion-supervision policy offers a generalizable paradigm for hybrid action representation, with strong potential for extension to broader domains including embodied intelligence. More details and visualizations are available at \href{https://infinidrive.github.io/DiffE2E/}{project website}.
LLA-MPC: Fast Adaptive Control for Autonomous Racing
We present Look-Back and Look-Ahead Adaptive Model Predictive Control (LLA-MPC), a real-time adaptive control framework for autonomous racing that addresses the challenge of rapidly changing tire-surface interactions. Unlike existing approaches requiring substantial data collection or offline training, LLA-MPC employs a model bank for immediate adaptation without a learning period. It integrates two key mechanisms: a look-back window that evaluates recent vehicle behavior to select the most accurate model and a look-ahead horizon that optimizes trajectory planning based on the identified dynamics. The selected model and estimated friction coefficient are then incorporated into a trajectory planner to optimize reference paths in real-time. Experiments across diverse racing scenarios demonstrate that LLA-MPC outperforms state-of-the-art methods in adaptation speed and handling, even during sudden friction transitions. Its learning-free, computationally efficient design enables rapid adaptation, making it ideal for high-speed autonomous racing in multi-surface environments.
SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control
This paper presents a novel framework that enables real-world humanoid robots to maintain stability while performing human-like motion. Current methods train a policy which allows humanoid robots to follow human body using the massive retargeted human data via reinforcement learning. However, due to the heterogeneity between human and humanoid robot motion, directly using retargeted human motion reduces training efficiency and stability. To this end, we introduce SMAP, a novel whole-body tracking framework that bridges the gap between human and humanoid action spaces, enabling accurate motion mimicry by humanoid robots. The core idea is to use a vector-quantized periodic autoencoder to capture generic atomic behaviors and adapt human motion into physically plausible humanoid motion. This adaptation accelerates training convergence and improves stability when handling novel or challenging motions. We then employ a privileged teacher to distill precise mimicry skills into the student policy with a proposed decoupled reward. We conduct experiments in simulation and real world to demonstrate the superiority stability and performance of SMAP over SOTA methods, offering practical guidelines for advancing whole-body control in humanoid robots.
comment: 15 pages, 11 figures
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i.e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal. Despite the great potential of the end-to-end paradigm, existing methods suffer from several aspects including expensive BEV (bird's eye view) computation, action diversity, and sub-optimal decision in complex real-world scenarios. To address these challenges, we propose a novel hybrid sparse-dense diffusion policy, empowered by a Vision-Language Model (VLM), called Diff-VLA. We explore the sparse diffusion representation for efficient multi-modal driving behavior. Moreover, we rethink the effectiveness of VLM driving decision and improve the trajectory generation guidance through deep interaction across agent, map instances and VLM output. Our method shows superior performance in Autonomous Grand Challenge 2025 which contains challenging real and reactive synthetic scenarios. Our methods achieves 45.0 PDMS.
comment: 4pages
Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners
Large language models (LLMs) have demonstrated strong performance in various robot control tasks. However, their deployment in real-world applications remains constrained. Even state-ofthe-art LLMs, such as GPT-o4mini, frequently produce invalid action plans that violate physical constraints, such as directing a robot to an unreachable location or causing collisions between robots. This issue primarily arises from a lack of awareness of these physical constraints during the reasoning process. To address this issue, we propose a novel framework that integrates reinforcement learning with verifiable rewards (RLVR) to incentivize knowledge of physical constraints into LLMs to induce constraints-aware reasoning during plan generation. In this approach, only valid action plans that successfully complete a control task receive positive rewards. We applied our method to two small-scale LLMs: a non-reasoning Qwen2.5-3B-Instruct and a reasoning Qwen3-4B. The experiment results demonstrate that constraint-aware small LLMs largely outperform large-scale models without constraints, grounded on both the BoxNet task and a newly developed BoxNet3D environment built using MuJoCo. This work highlights the effectiveness of grounding even small LLMs with physical constraints to enable scalable and efficient multi-robot control in complex, physically constrained environments.
Developing a Robotic Surgery Training System for Wide Accessibility and Research
Robotic surgery represents a major breakthrough in medical interventions, which has revolutionized surgical procedures. However, the high cost and limited accessibility of robotic surgery systems pose significant challenges for training purposes. This study addresses these issues by developing a cost-effective robotic laparoscopy training system that closely replicates advanced robotic surgery setups to ensure broad access for both on-site and remote users. Key innovations include the design of a low-cost robotic end-effector that effectively mimics high-end laparoscopic instruments. Additionally, a digital twin platform was established, facilitating detailed simulation, testing, and real-time monitoring, which enhances both system development and deployment. Furthermore, teleoperation control was optimized, leading to improved trajectory tracking while maintaining remote center of motion (RCM) constraint, with a RMSE of 5 {\mu}m and reduced system latency to 0.01 seconds. As a result, the system provides smooth, continuous motion and incorporates essential safety features, making it a highly effective tool for laparoscopic training.
comment: 6 pages, The IEEE International Conference on Advanced Robotics and Mechatronics (IEEE ARM 2025), accepted
CoRI: Synthesizing Communication of Robot Intent for Physical Human-Robot Interaction
Clear communication of robot intent fosters transparency and interpretability in physical human-robot interaction (pHRI), particularly during assistive tasks involving direct human-robot contact. We introduce CoRI, a pipeline that automatically generates natural language communication of a robot's upcoming actions directly from its motion plan and visual perception. Our pipeline first processes the robot's image view to identify human poses and key environmental features. It then encodes the planned 3D spatial trajectory (including velocity and force) onto this view, visually grounding the path and its dynamics. CoRI queries a vision-language model with this visual representation to interpret the planned action within the visual context before generating concise, user-directed statements, without relying on task-specific information. Results from a user study involving robot-assisted feeding, bathing, and shaving tasks across two different robots indicate that CoRI leads to statistically significant difference in communication clarity compared to a baseline communication strategy. Specifically, CoRI effectively conveys not only the robot's high-level intentions but also crucial details about its motion and any collaborative user action needed.
comment: 33 pages, 10 figures
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action Models have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interactions, robots can improve understanding, adapt to, and execute complex tasks in dynamic real-world environments. However, embodied AI in mobile service robots continues to face key challenges, including multimodal sensor fusion, real-time decision-making under uncertainty, task generalization, and effective human-robot interactions (HRI). In this paper, we present the first systematic review of the integration of foundation models in mobile service robotics, identifying key open challenges in embodied AI and examining how foundation models can address them. Namely, we explore the role of such models in enabling real-time sensor fusion, language-conditioned control, and adaptive task execution. Furthermore, we discuss real-world applications in the domestic assistance, healthcare, and service automation sectors, demonstrating the transformative impact of foundation models on service robotics. We also include potential future research directions, emphasizing the need for predictive scaling laws, autonomous long-term adaptation, and cross-embodiment generalization to enable scalable, efficient, and robust deployment of foundation models in human-centric robotic systems.
ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Vision-based tactile sensing has been widely used in perception, reconstruction, and robotic manipulation. However, collecting large-scale tactile data remains costly due to the localized nature of sensor-object interactions and inconsistencies across sensor instances. Existing approaches to scaling tactile data, such as simulation and free-form tactile generation, often suffer from unrealistic output and poor transferability to downstream tasks.To address this, we propose ControlTac, a two-stage controllable framework that generates realistic tactile images conditioned on a single reference tactile image, contact force, and contact position. With those physical priors as control input, ControlTac generates physically plausible and varied tactile images that can be used for effective data augmentation. Through experiments on three downstream tasks, we demonstrate that ControlTac can effectively augment tactile datasets and lead to consistent gains. Our three real-world experiments further validate the practical utility of our approach. Project page: https://dongyuluo.github.io/controltac.
comment: 22 pages, 11 figures, 7 tables
WeatherEdit: Controllable Weather Editing with 4D Gaussian Field
In this work, we present WeatherEdit, a novel weather editing pipeline for generating realistic weather effects with controllable types and severity in 3D scenes. Our approach is structured into two key components: weather background editing and weather particle construction. For weather background editing, we introduce an all-in-one adapter that integrates multiple weather styles into a single pretrained diffusion model, enabling the generation of diverse weather effects in 2D image backgrounds. During inference, we design a Temporal-View (TV-) attention mechanism that follows a specific order to aggregate temporal and spatial information, ensuring consistent editing across multi-frame and multi-view images. To construct the weather particles, we first reconstruct a 3D scene using the edited images and then introduce a dynamic 4D Gaussian field to generate snowflakes, raindrops and fog in the scene. The attributes and dynamics of these particles are precisely controlled through physical-based modelling and simulation, ensuring realistic weather representation and flexible severity adjustments. Finally, we integrate the 4D Gaussian field with the 3D scene to render consistent and highly realistic weather effects. Experiments on multiple driving datasets demonstrate that WeatherEdit can generate diverse weather effects with controllable condition severity, highlighting its potential for autonomous driving simulation in adverse weather. See project page: https://jumponthemoon.github.io/w-edit
HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval
We hand the community HAND, a simple and time-efficient method for teaching robots new manipulation tasks through human hand demonstrations. Instead of relying on task-specific robot demonstrations collected via teleoperation, HAND uses easy-to-provide hand demonstrations to retrieve relevant behaviors from task-agnostic robot play data. Using a visual tracking pipeline, HAND extracts the motion of the human hand from the hand demonstration and retrieves robot sub-trajectories in two stages: first filtering by visual similarity, then retrieving trajectories with similar behaviors to the hand. Fine-tuning a policy on the retrieved data enables real-time learning of tasks in under four minutes, without requiring calibrated cameras or detailed hand pose estimation. Experiments also show that HAND outperforms retrieval baselines by over 2x in average task success rates on real robots. Videos can be found at our project website: https://liralab.usc.edu/handretrieval/.
OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation
Visual imitation learning enables robotic agents to acquire skills by observing expert demonstration videos. In the one-shot setting, the agent generates a policy after observing a single expert demonstration without additional fine-tuning. Existing approaches typically train and evaluate on the same set of tasks, varying only object configurations, and struggle to generalize to unseen tasks with different semantic or structural requirements. While some recent methods attempt to address this, they exhibit low success rates on hard test tasks that, despite being visually similar to some training tasks, differ in context and require distinct responses. Additionally, most existing methods lack an explicit model of environment dynamics, limiting their ability to reason about future states. To address these limitations, we propose a novel framework for one-shot visual imitation learning via world-model-guided trajectory generation. Given an expert demonstration video and the agent's initial observation, our method leverages a learned world model to predict a sequence of latent states and actions. This latent trajectory is then decoded into physical waypoints that guide the agent's execution. Our method is evaluated on two simulated benchmarks and three real-world robotic platforms, where it consistently outperforms prior approaches, with over 30% improvement in some cases.
Robot Operation of Home Appliances by Reading User Manuals
Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) ground the policies to the appliance in the physical world, and (iii) execute the policies reliably over potentially many steps, despite compounding errors. To tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It grounds the symbolic actions visually to control panel elements. Finally, ApBot closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success rate, compared with state-of-the-art large VLMs used directly as control policies. These results suggest that a structured internal representations plays an important role in robust robot operation of home appliances, especially, complex ones.
Vision-Based Risk Aware Emergency Landing for UAVs in Complex Urban Environments
Landing safely in crowded urban environments remains an essential yet challenging endeavor for Unmanned Aerial Vehicles (UAVs), especially in emergency situations. In this work, we propose a risk-aware approach that harnesses semantic segmentation to continuously evaluate potential hazards in the drone's field of view. By using a specialized deep neural network to assign pixel-level risk values and applying an algorithm based on risk maps, our method adaptively identifies a stable Safe Landing Zone (SLZ) despite moving critical obstacles such as vehicles, people, etc., and other visual challenges like shifting illumination. A control system then guides the UAV toward this low-risk region, employing altitude-dependent safety thresholds and temporal landing point stabilization to ensure robust descent trajectories. Experimental validation in diverse urban environments demonstrates the effectiveness of our approach, achieving over 90% landing success rates in very challenging real scenarios, showing significant improvements in various risk metrics. Our findings suggest that risk-oriented vision methods can effectively help reduce the risk of accidents in emergency landing situations, particularly in complex, unstructured, urban scenarios, densely populated with moving risky obstacles, while potentiating the true capabilities of UAVs in complex urban operations.
RetroMotion: Retrocausal Motion Forecasting Models are Instructable
Motion forecasts of road users (i.e., agents) vary in complexity as a function of scene constraints and interactive behavior. We address this with a multi-task learning method for motion forecasting that includes a retrocausal flow of information. The corresponding tasks are to forecast (1) marginal trajectory distributions for all modeled agents and (2) joint trajectory distributions for interacting agents. Using a transformer model, we generate the joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. Per trajectory point, we model positional uncertainty using compressed exponential power distributions. Notably, our method achieves state-of-the-art results in the Waymo Interaction Prediction dataset and generalizes well to the Argoverse 2 dataset. Additionally, our method provides an interface for issuing instructions through trajectory modifications. Our experiments show that regular training of motion forecasting leads to the ability to follow goal-based instructions and to adapt basic directional instructions to the scene context. Code: https://github.com/kit-mrt/future-motion
Co-Design of Soft Gripper with Neural Physics
For robot manipulation, both the controller and end-effector design are crucial. Soft grippers are generalizable by deforming to different geometries, but designing such a gripper and finding its grasp pose remains challenging. In this paper, we propose a co-design framework that generates an optimized soft gripper's block-wise stiffness distribution and its grasping pose, using a neural physics model trained in simulation. We derived a uniform-pressure tendon model for a flexure-based soft finger, then generated a diverse dataset by randomizing both gripper pose and design parameters. A neural network is trained to approximate this forward simulation, yielding a fast, differentiable surrogate. We embed that surrogate in an end-to-end optimization loop to optimize the ideal stiffness configuration and best grasp pose. Finally, we 3D-print the optimized grippers of various stiffness by changing the structural parameters. We demonstrate that our co-designed grippers significantly outperform baseline designs in both simulation and hardware experiments.
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
Generalizable active mapping in complex unknown environments remains a critical challenge for mobile robots. Existing methods, constrained by insufficient training data and conservative exploration strategies, exhibit limited generalizability across scenes with diverse layouts and complex connectivity. To enable scalable training and reliable evaluation, we introduce GLEAM-Bench, the first large-scale benchmark designed for generalizable active mapping with 1,152 diverse 3D scenes from synthetic and real-scan datasets. Building upon this foundation, we propose GLEAM, a unified generalizable exploration policy for active mapping. Its superior generalizability comes mainly from our semantic representations, long-term navigable goals, and randomized strategies. It significantly outperforms state-of-the-art methods, achieving 66.50% coverage (+9.49%) with efficient trajectories and improved mapping accuracy on 128 unseen complex scenes. Project page: https://xiao-chen.tech/gleam/.
comment: Project page: https://xiao-chen.tech/gleam/
EgoZero: Robot Learning from Smart Glasses
Despite recent progress in general purpose robotics, robot policies still lag far behind basic human capabilities in the real world. Humans interact constantly with the physical world, yet this rich data resource remains largely untapped in robot learning. We propose EgoZero, a minimal system that learns robust manipulation policies from human demonstrations captured with Project Aria smart glasses, $\textbf{and zero robot data}$. EgoZero enables: (1) extraction of complete, robot-executable actions from in-the-wild, egocentric, human demonstrations, (2) compression of human visual observations into morphology-agnostic state representations, and (3) closed-loop policy learning that generalizes morphologically, spatially, and semantically. We deploy EgoZero policies on a gripper Franka Panda robot and demonstrate zero-shot transfer with 70% success rate over 7 manipulation tasks and only 20 minutes of data collection per task. Our results suggest that in-the-wild human data can serve as a scalable foundation for real-world robot learning - paving the way toward a future of abundant, diverse, and naturalistic training data for robots. Code and videos are available at https://egozero-robot.github.io.
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
The rapid evolution of large language models in natural language processing has substantially elevated their semantic understanding and logical reasoning capabilities. Such proficiencies have been leveraged in autonomous driving systems, contributing to significant improvements in system performance. Models such as OpenAI o1 and DeepSeek-R1, leverage Chain-of-Thought (CoT) reasoning, an advanced cognitive method that simulates human thinking processes, demonstrating remarkable reasoning capabilities in complex tasks. By structuring complex driving scenarios within a systematic reasoning framework, this approach has emerged as a prominent research focus in autonomous driving, substantially improving the system's ability to handle challenging cases. This paper investigates how CoT methods improve the reasoning abilities of autonomous driving models. Based on a comprehensive literature review, we present a systematic analysis of the motivations, methodologies, challenges, and future research directions of CoT in autonomous driving. Furthermore, we propose the insight of combining CoT with self-learning to facilitate self-evolution in driving systems. To ensure the relevance and timeliness of this study, we have compiled a dynamic repository of literature and open-source projects, diligently updated to incorporate forefront developments. The repository is publicly available at https://github.com/cuiyx1720/Awesome-CoT4AD.
comment: 18 pages, 6 figures
URPlanner: A Universal Paradigm For Collision-Free Robotic Motion Planning Based on Deep Reinforcement Learning
Collision-free motion planning for redundant robot manipulators in complex environments is yet to be explored. Although recent advancements at the intersection of deep reinforcement learning (DRL) and robotics have highlighted its potential to handle versatile robotic tasks, current DRL-based collision-free motion planners for manipulators are highly costly, hindering their deployment and application. This is due to an overreliance on the minimum distance between the manipulator and obstacles, inadequate exploration and decision-making by DRL, and inefficient data acquisition and utilization. In this article, we propose URPlanner, a universal paradigm for collision-free robotic motion planning based on DRL. URPlanner offers several advantages over existing approaches: it is platform-agnostic, cost-effective in both training and deployment, and applicable to arbitrary manipulators without solving inverse kinematics. To achieve this, we first develop a parameterized task space and a universal obstacle avoidance reward that is independent of minimum distance. Second, we introduce an augmented policy exploration and evaluation algorithm that can be applied to various DRL algorithms to enhance their performance. Third, we propose an expert data diffusion strategy for efficient policy learning, which can produce a large-scale trajectory dataset from only a few expert demonstrations. Finally, the superiority of the proposed methods is comprehensively verified through experiments.
comment: Version 1. 20 pages, 19 figures
DTRT: Enhancing Human Intent Estimation and Role Allocation for Physical Human-Robot Collaboration ICRA 2025
In physical Human-Robot Collaboration (pHRC), accurate human intent estimation and rational human-robot role allocation are crucial for safe and efficient assistance. Existing methods that rely on short-term motion data for intention estimation lack multi-step prediction capabilities, hindering their ability to sense intent changes and adjust human-robot assignments autonomously, resulting in potential discrepancies. To address these issues, we propose a Dual Transformer-based Robot Trajectron (DTRT) featuring a hierarchical architecture, which harnesses human-guided motion and force data to rapidly capture human intent changes, enabling accurate trajectory predictions and dynamic robot behavior adjustments for effective collaboration. Specifically, human intent estimation in DTRT uses two Transformer-based Conditional Variational Autoencoders (CVAEs), incorporating robot motion data in obstacle-free case with human-guided trajectory and force for obstacle avoidance. Additionally, Differential Cooperative Game Theory (DCGT) is employed to synthesize predictions based on human-applied forces, ensuring robot behavior align with human intention. Compared to state-of-the-art (SOTA) methods, DTRT incorporates human dynamics into long-term prediction, providing an accurate understanding of intention and enabling rational role allocation, achieving robot autonomy and maneuverability. Experiments demonstrate DTRT's accurate intent estimation and superior collaboration performance.
comment: Accepted by ICRA 2025
VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion
Recent success in legged robot locomotion is attributed to the integration of reinforcement learning and physical simulators. However, these policies often encounter challenges when deployed in real-world environments due to sim-to-real gaps, as simulators typically fail to replicate visual realism and complex real-world geometry. Moreover, the lack of realistic visual rendering limits the ability of these policies to support high-level tasks requiring RGB-based perception like ego-centric navigation. This paper presents a Real-to-Sim-to-Real framework that generates photorealistic and physically interactive "digital twin" simulation environments for visual navigation and locomotion learning. Our approach leverages 3D Gaussian Splatting (3DGS) based scene reconstruction from multi-view images and integrates these environments into simulations that support ego-centric visual perception and mesh-based physical interactions. To demonstrate its effectiveness, we train a reinforcement learning policy within the simulator to perform a visual goal-tracking task. Extensive experiments show that our framework achieves RGB-only sim-to-real policy transfer. Additionally, our framework facilitates the rapid adaptation of robot policies with effective exploration capability in complex new environments, highlighting its potential for applications in households and factories.
comment: Project Page: https://vr-robo.github.io/
Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning
3D activity reasoning and planning has attracted increasing attention in human-robot interaction and embodied AI thanks to the recent advance in multimodal learning. However, most existing studies are facing two common challenges: 1) heavy reliance on explicit instructions with little reasoning on implicit user intention; 2) negligence of inter-step route planning on robot moves. We address the above challenges by proposing 3D activity reasoning and planning, a novel 3D task that reasons the intended activities from implicit instructions and decomposes them into steps with inter-step routes and planning under the guidance of fine-grained 3D object shapes and locations from scene segmentation. We tackle the new 3D task from two perspectives. First, we construct ReasonPlan3D, a large-scale benchmark that covers diverse 3D scenes with rich implicit instructions and detailed annotations for multi-step task planning, inter-step route planning, and fine-grained segmentation. Second, we design a novel framework that introduces progressive plan generation with contextual consistency across multiple steps, as well as a scene graph that is updated dynamically for capturing critical objects and their spatial relations. Extensive experiments demonstrate the effectiveness of our benchmark and framework in reasoning activities from implicit human instructions, producing accurate stepwise task plans and seamlessly integrating route planning for multi-step moves. The dataset and code will be released.
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction. While prior works have primarily focused on high-level planning and long-horizon navigation, we shift attention to language-guided fine-grained trajectory control, where UAVs execute short-range, reactive flight behaviors in response to language instructions. We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach. In this framework, UAVs learn fine-grained control policies by mimicking expert pilot trajectories paired with atomic language instructions. To support this paradigm, we present UAV-Flow, the first real-world benchmark for language-conditioned, fine-grained UAV control. It includes a task formulation, a large-scale dataset collected in diverse environments, a deployable control framework, and a simulation suite for systematic evaluation. Our design enables UAVs to closely imitate the precise, expert-level flight trajectories of human pilots and supports direct deployment without sim-to-real gap. We conduct extensive experiments on UAV-Flow, benchmarking VLN and VLA paradigms. Results show that VLA models are superior to VLN baselines and highlight the critical role of spatial grounding in the fine-grained Flow setting.
HybridTrack: A Hybrid Approach for Robust Multi-Object Tracking
The evolution of Advanced Driver Assistance Systems (ADAS) has increased the need for robust and generalizable algorithms for multi-object tracking. Traditional statistical model-based tracking methods rely on predefined motion models and assumptions about system noise distributions. Although computationally efficient, they often lack adaptability to varying traffic scenarios and require extensive manual design and parameter tuning. To address these issues, we propose a novel 3D multi-object tracking approach for vehicles, HybridTrack, which integrates a data-driven Kalman Filter (KF) within a tracking-by-detection paradigm. In particular, it learns the transition residual and Kalman gain directly from data, which eliminates the need for manual motion and stochastic parameter modeling. Validated on the real-world KITTI dataset, HybridTrack achieves 82.72% HOTA accuracy, significantly outperforming state-of-the-art methods. We also evaluate our method under different configurations, achieving the fastest processing speed of 112 FPS. Consequently, HybridTrack eliminates the dependency on scene-specific designs while improving performance and maintaining real-time efficiency. The code is publicly available at: https://github.com/leandro-svg/HybridTrack.
comment: IEEE ROBOTICS AND AUTOMATION LETTERS. ACCEPTED MAY, 2025
Diffusion-based learning of contact plans for agile locomotion
Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as stepping stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on stepping stones. In our framework, we use nonlinear model predictive control (NMPC) to generate whole-body motions for a given contact plan. To efficiently search for an optimal contact plan, we propose to use Monte Carlo tree search (MCTS). While the combination of MCTS and NMPC can quickly find a feasible plan for a given environment (a few seconds), it is not yet suitable to be used as a reactive policy. Hence, we generate a dataset for optimal goal-conditioned policy for a given scene and learn it through supervised learning. In particular, we leverage the power of diffusion models in handling multi-modality in the dataset. We test our proposed framework on a scenario where our quadruped robot Solo12 successfully jumps to different goals in a highly constrained environment.
PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
Real-time and high-performance 3D object detection plays a critical role in autonomous driving and robotics. Recent pillar-based 3D object detectors have gained significant attention due to their compact representation and low computational overhead, making them suitable for onboard deployment and quantization. However, existing pillar-based detectors still suffer from information loss along height dimension and large numerical distribution difference during pillar feature encoding (PFE), which severely limits their performance and quantization potential. To address above issue, we first unveil the importance of different input information during PFE and identify the height dimension as a key factor in enhancing 3D detection performance. Motivated by this observation, we propose a height-aware pillar feature encoder, called PillarHist. Specifically, PillarHist statistics the discrete distribution of points at different heights within one pillar with the information entropy guidance. This simple yet effective design greatly preserves the information along the height dimension while significantly reducing the computation overhead of the PFE. Meanwhile, PillarHist also constrains the arithmetic distribution of PFE input to a stable range, making it quantization-friendly. Notably, PillarHist operates exclusively within the PFE stage to enhance performance, enabling seamless integration into existing pillar-based methods without introducing complex operations. Extensive experiments show the effectiveness of PillarHist in terms of both efficiency and performance.
Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction
This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based neural network that generates high-resolution, multi-step predictions in a single operation. The prediction outcomes are further utilized to create geometric shapes formulated as mathematical constraints. Instead of treating each dynamic obstacle individually, predicted obstacles are grouped by proximity in an unsupervised way to improve performance and efficiency. The overall collision-free navigation is handled by model predictive control with a specific design for proactive dynamic obstacle avoidance. The proposed approach allows mobile robots to navigate effectively in dynamic environments. Its performance is accessed across various scenarios that represent typical warehouse settings. The results demonstrate that the proposed approach outperforms other existing dynamic obstacle avoidance methods.
comment: Accepted by IEEE RA-L
MOON: Multi-Objective Optimization-Driven Object-Goal Navigation Using a Variable-Horizon Set-Orienteering Planner
Object-goal navigation (ON) enables autonomous robots to locate and reach user-specified objects in previously unknown environments, offering promising applications in domains such as assistive care and disaster response. Existing ON methods -- including training-free approaches, reinforcement learning, and zero-shot planners -- generally depend on active exploration to identify landmark objects (e.g., kitchens or desks), followed by navigation toward semantically related targets (e.g., a specific mug). However, these methods often lack strategic planning and do not adequately address trade-offs among multiple objectives. To overcome these challenges, we propose a novel framework that formulates ON as a multi-objective optimization problem (MOO), balancing frontier-based knowledge exploration with knowledge exploitation over previously observed landmarks; we call this framework MOON (MOO-driven ON). We implement a prototype MOON system that integrates three key components: (1) building on QOM [IROS05], a classical ON system that compactly and discriminatively encodes landmarks based on their semantic relevance to the target; (2) integrating StructNav [RSS23], a recently proposed training-free planner, to enhance the navigation pipeline; and (3) introducing a variable-horizon set orienteering problem formulation to enable global optimization over both exploration and exploitation strategies. This work represents an important first step toward developing globally optimized, next-generation object-goal navigation systems.
comment: 9 pages, 7 figures, technical report
An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems
Robotic systems driven by artificial muscles present unique challenges due to the nonlinear dynamics of actuators and the complex designs of mechanical structures. Traditional model-based controllers often struggle to achieve desired control performance in such systems. Deep reinforcement learning (DRL), a trending machine learning technique widely adopted in robot control, offers a promising alternative. However, integrating DRL into these robotic systems faces significant challenges, including the requirement for large amounts of training data and the inevitable sim-to-real gap when deployed to real-world robots. This paper proposes an efficient reinforcement learning control framework with sim-to-real transfer to address these challenges. Bootstrap and augmentation enhancements are designed to improve the data efficiency of baseline DRL algorithms, while a sim-to-real transfer technique, namely randomization of muscle dynamics, is adopted to bridge the gap between simulation and real-world deployment. Extensive experiments and ablation studies are conducted utilizing two string-type artificial muscle-driven robotic systems including a two degree-of-freedom robotic eye and a parallel robotic wrist, the results of which demonstrate the effectiveness of the proposed learning control strategy.
Robo-Troj: Attacking LLM-based Task Planners
Robots need task planning methods to achieve goals that require more than individual actions. Recently, large language models (LLMs) have demonstrated impressive performance in task planning. LLMs can generate a step-by-step solution using a description of actions and the goal. Despite the successes in LLM-based task planning, there is limited research studying the security aspects of those systems. In this paper, we develop Robo-Troj, the first multi-trigger backdoor attack for LLM-based task planners, which is the main contribution of this work. As a multi-trigger attack, Robo-Troj is trained to accommodate the diversity of robot application domains. For instance, one can use unique trigger words, e.g., "herical", to activate a specific malicious behavior, e.g., cutting hand on a kitchen robot. In addition, we develop an optimization method for selecting the trigger words that are most effective. Through demonstrating the vulnerability of LLM-based planners, we aim to promote the development of secured robot systems.
A low-cost and lightweight 6 DoF bimanual arm for dynamic and contact-rich manipulation
Dynamic and contact-rich object manipulation, such as striking, snatching, or hammering, remains challenging for robotic systems due to hardware limitations. Most existing robots are constrained by high-inertia design, limited compliance, and reliance on expensive torque sensors. To address this, we introduce ARMADA (Affordable Robot for Manipulation and Dynamic Actions), a 6 degrees-of-freedom bimanual robot designed for dynamic manipulation research. ARMADA combines low-inertia, back-drivable actuators with a lightweight design, using readily available components and 3D-printed links for ease of assembly in research labs. The entire system, including both arms, is built for just $6,100. Each arm achieves speeds up to 6.16m/s, almost twice that of most collaborative robots, with a comparable payload of 2.5kg. We demonstrate ARMADA can perform dynamic manipulation like snatching, hammering, and bimanual throwing in real-world environments. We also showcase its effectiveness in reinforcement learning (RL) by training a non-prehensile manipulation policy in simulation and transferring it zero-shot to the real world, as well as human motion shadowing for dynamic bimanual object throwing. ARMADA is fully open-sourced with detailed assembly instructions, CAD models, URDFs, simulation, and learning codes. We highly recommend viewing the supplementary video at https://sites.google.com/view/im2-humanoid-arm.
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Simulation is critical for safety evaluation in autonomous driving, particularly in capturing complex interactive behaviors. However, generating realistic and controllable traffic scenarios in long-tail situations remains a significant challenge. Existing generative models suffer from the conflicting objective between user-defined controllability and realism constraints, which is amplified in safety-critical contexts. In this work, we introduce the Causal Compositional Diffusion Model (CCDiff), a structure-guided diffusion framework to address these challenges. We first formulate the learning of controllable and realistic closed-loop simulation as a constrained optimization problem. Then, CCDiff maximizes controllability while adhering to realism by automatically identifying and injecting causal structures directly into the diffusion process, providing structured guidance to enhance both realism and controllability. Through rigorous evaluations on benchmark datasets and in a closed-loop simulator, CCDiff demonstrates substantial gains over state-of-the-art approaches in generating realistic and user-preferred trajectories. Our results show CCDiff's effectiveness in extracting and leveraging causal structures, showing improved closed-loop performance based on key metrics such as collision rate, off-road rate, FDE, and comfort.
Generalizable Motion Planning via Operator Learning
In this work, we introduce a planning neural operator (PNO) for predicting the value function of a motion planning problem. We recast value function approximation as learning a single operator from the cost function space to the value function space, which is defined by an Eikonal partial differential equation (PDE). Therefore, our PNO model, despite being trained with a finite number of samples at coarse resolution, inherits the zero-shot super-resolution property of neural operators. We demonstrate accurate value function approximation at $16\times$ the training resolution on the MovingAI lab's 2D city dataset, compare with state-of-the-art neural value function predictors on 3D scenes from the iGibson building dataset and showcase optimal planning with 4-DOF robotic manipulators. Lastly, we investigate employing the value function output of PNO as a heuristic function to accelerate motion planning. We show theoretically that the PNO heuristic is $\epsilon$-consistent by introducing an inductive bias layer that guarantees our value functions satisfy the triangle inequality. With our heuristic, we achieve a $30\%$ decrease in nodes visited while obtaining near optimal path lengths on the MovingAI lab 2D city dataset, compared to classical planning methods ($A^\ast$, $RRT^\ast$).
Visual-auditory Extrinsic Contact Estimation
Robust manipulation often hinges on a robot's ability to perceive extrinsic contacts-contacts between a grasped object and its surrounding environment. However, these contacts are difficult to observe through vision alone due to occlusions, limited resolution, and ambiguous near-contact states. In this paper, we propose a visual-auditory method for extrinsic contact estimation that integrates global scene information from vision with local contact cues obtained through active audio sensing. Our approach equips a robotic gripper with contact microphones and conduction speakers, enabling the system to emit and receive acoustic signals through the grasped object to detect external contacts. We train our perception pipeline entirely in simulation and zero-shot transfer to the real world. To bridge the sim-to-real gap, we introduce a real-to-sim audio hallucination technique, injecting real-world audio samples into simulated scenes with ground-truth contact labels. The resulting multimodal model accurately estimates both the location and size of extrinsic contacts across a range of cluttered and occluded scenarios. We further demonstrate that explicit contact prediction significantly improves policy learning for downstream contact-rich manipulation tasks.
comment: 16 pages, 9 figures
The Finer Points: A Systematic Comparison of Point-Cloud Extractors for Radar Odometry
A key element of many odometry pipelines using spinning frequency-modulated continuous-wave (FMCW) radar is the extraction of a point-cloud from the raw signal. This extraction greatly impacts the overall performance of point-cloud-based odometry. This paper provides a first-of-its-kind, comprehensive comparison of 13 common radar point-cloud extractors for the task of iterative closest point based odometry in autonomous driving environments. Each extractor's parameters are tuned and tested on two FMCW radar datasets using approximately 176km of data from public roads. We find that the simplest, and fastest extractor, K-strongest, is the best overall extractor, consistently outperforming the average by 13.59% and 24.94% on each dataset, respectively. Additionally, we highlight the significance of tuning an extractor and the substantial improvement in odometry accuracy that it yields.
comment: 8 pages, 2 figures, 1 table. Submitted to the 22nd Conference on Computer and Robot Vision (CRV 2025)
H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos
Large-scale pre-training using videos has proven effective for robot learning. However, the models pre-trained on such data can be suboptimal for robot learning due to the significant visual gap between human hands and those of different robots. To remedy this, we propose H2R, a simple data augmentation technique that detects human hand keypoints, synthesizes robot motions in simulation, and composites rendered robots into egocentric videos. This process explicitly bridges the visual gap between human and robot embodiments during pre-training. We apply H2R to augment large-scale egocentric human video datasets such as Ego4D and SSv2, replacing human hands with simulated robotic arms to generate robot-centric training data. Based on this, we construct and release a family of 1M-scale datasets covering multiple robot embodiments (UR5 with gripper/Leaphand, Franka) and data sources (SSv2, Ego4D). To verify the effectiveness of the augmentation pipeline, we introduce a CLIP-based image-text similarity metric that quantitatively evaluates the semantic fidelity of robot-rendered frames to the original human actions. We validate H2R across three simulation benchmarks: Robomimic, RLBench and PushT and real-world manipulation tasks with a UR5 robot equipped with Gripper and Leaphand end-effectors. H2R consistently improves downstream success rates, yielding gains of 5.0%-10.2% in simulation and 6.7%-23.3% in real-world tasks across various visual encoders and policy learning methods. These results indicate that H2R improves the generalization ability of robotic policies by mitigating the visual discrepancies between human and robot domains.
Marginalizing and Conditioning Gaussians onto Linear Approximations of Smooth Manifolds with Applications in Robotics ICRA 2025
We present closed-form expressions for marginalizing and conditioning Gaussians onto linear manifolds, and demonstrate how to apply these expressions to smooth nonlinear manifolds through linearization. Although marginalization and conditioning onto axis-aligned manifolds are well-established procedures, doing so onto non-axis-aligned manifolds is not as well understood. We demonstrate the utility of our expressions through three applications: 1) approximation of the projected normal distribution, where the quality of our linearized approximation increases as problem nonlinearity decreases; 2) covariance extraction in Koopman SLAM, where our covariances are shown to be consistent on a real-world dataset; and 3) covariance extraction in constrained GTSAM, where our covariances are shown to be consistent in simulation.
comment: Final version in IEEE ICRA 2025 (winner of the Best Conference Paper Award)
PECAN: Personalizing Robot Behaviors through a Learned Canonical Space
Robots should personalize how they perform tasks to match the needs of individual human users. Today's robot achieve this personalization by asking for the human's feedback in the task space. For example, an autonomous car might show the human two different ways to decelerate at stoplights, and ask the human which of these motions they prefer. This current approach to personalization is indirect: based on the behaviors the human selects (e.g., decelerating slowly), the robot tries to infer their underlying preference (e.g., defensive driving). By contrast, our paper develops a learning and interface-based approach that enables humans to directly indicate their desired style. We do this by learning an abstract, low-dimensional, and continuous canonical space from human demonstration data. Each point in the canonical space corresponds to a different style (e.g., defensive or aggressive driving), and users can directly personalize the robot's behavior by simply clicking on a point. Given the human's selection, the robot then decodes this canonical style across each task in the dataset -- e.g., if the human selects a defensive style, the autonomous car personalizes its behavior to drive defensively when decelerating, passing other cars, or merging onto highways. We refer to our resulting approach as PECAN: Personalizing Robot Behaviors through a Learned Canonical Space. Our simulations and user studies suggest that humans prefer using PECAN to directly personalize robot behavior (particularly when those users become familiar with PECAN), and that users find the learned canonical space to be intuitive and consistent. See videos here: https://youtu.be/wRJpyr23PKI
comment: 33 pages, 11 figures, ACM Transactions on Human-Robot Interaction
Multiagent Systems
Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications
Multi-Agent Reinforcement Learning (MARL) has shown great potential as an adaptive solution for addressing modern cybersecurity challenges. MARL enables decentralized, adaptive, and collaborative defense strategies and provides an automated mechanism to combat dynamic, coordinated, and sophisticated threats. This survey investigates the current state of research in MARL applications for automated cyber defense (ACD), focusing on intruder detection and lateral movement containment. Additionally, it examines the role of Autonomous Intelligent Cyber-defense Agents (AICA) and Cyber Gyms in training and validating MARL agents. Finally, the paper outlines existing challenges, such as scalability and adversarial robustness, and proposes future research directions. This also discusses how MARL integrates in AICA to provide adaptive, scalable, and dynamic solutions to counter the increasingly sophisticated landscape of cyber threats. It highlights the transformative potential of MARL in areas like intrusion detection and lateral movement containment, and underscores the value of Cyber Gyms for training and validation of AICA.
Adaptive Episode Length Adjustment for Multi-agent Reinforcement Learning
In standard reinforcement learning, an episode is defined as a sequence of interactions between agents and the environment, which terminates upon reaching a terminal state or a pre-defined episode length. Setting a shorter episode length enables the generation of multiple episodes with the same number of data samples, thereby facilitating an exploration of diverse states. While shorter episodes may limit the collection of long-term interactions, they may offer significant advantages when properly managed. For example, trajectory truncation in single-agent reinforcement learning has shown how the benefits of shorter episodes can be leveraged despite the trade-off of reduced long-term interaction experiences. However, this approach remains underexplored in MARL. This paper proposes a novel MARL approach, Adaptive Episode Length Adjustment (AELA), where the episode length is initially limited and gradually increased based on an entropy-based assessment of learning progress. By starting with shorter episodes, agents can focus on learning effective strategies for initial states and minimize time spent in dead-end states. The use of entropy as an assessment metric prevents premature convergence to suboptimal policies and ensures balanced training over varying episode lengths. We validate our approach using the StarCraft Multi-agent Challenge (SMAC) and a modified predator-prey environment, demonstrating significant improvements in both convergence speed and overall performance compared to existing methods. To the best of our knowledge, this is the first study to adaptively adjust episode length in MARL based on learning progress.
Multi-Agent Collaboration via Evolving Orchestration
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator's evolution.
comment: Work in Progress
LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer
This study presents the LLM-Agent-Controller, a multi-agent large language model (LLM) system developed to address a wide range of problems in control engineering (Control Theory). The system integrates a central controller agent with multiple specialized auxiliary agents, responsible for tasks such as controller design, model representation, control analysis, time-domain response, and simulation. A supervisor oversees high-level decision-making and workflow coordination, enhancing the system's reliability and efficiency. The LLM-Agent-Controller incorporates advanced capabilities, including Retrieval-Augmented Generation (RAG), Chain-of-Thought reasoning, self-criticism and correction, efficient memory handling, and user-friendly natural language communication. It is designed to function without requiring users to have prior knowledge of Control Theory, enabling them to input problems in plain language and receive complete, real-time solutions. To evaluate the system, we propose new performance metrics assessing both individual agents and the system as a whole. We test five categories of Control Theory problems and benchmark performance across three advanced LLMs. Additionally, we conduct a comprehensive qualitative conversational analysis covering all key services. Results show that the LLM-Agent-Controller successfully solved 83% of general tasks, with individual agents achieving an average success rate of 87%. Performance improved with more advanced LLMs. This research demonstrates the potential of multi-agent LLM architectures to solve complex, domain-specific problems. By integrating specialized agents, supervisory control, and advanced reasoning, the LLM-Agent-Controller offers a scalable, robust, and accessible solution framework that can be extended to various technical domains.
DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
Existing medical RAG systems mainly leverage knowledge from medical knowledge bases, neglecting the crucial role of experiential knowledge derived from similar patient cases -- a key component of human clinical reasoning. To bridge this gap, we propose DoctorRAG, a RAG framework that emulates doctor-like reasoning by integrating both explicit clinical knowledge and implicit case-based experience. DoctorRAG enhances retrieval precision by first allocating conceptual tags for queries and knowledge sources, together with a hybrid retrieval mechanism from both relevant knowledge and patient. In addition, a Med-TextGrad module using multi-agent textual gradients is integrated to ensure that the final output adheres to the retrieved knowledge and patient query. Comprehensive experiments on multilingual, multitask datasets demonstrate that DoctorRAG significantly outperforms strong baseline RAG models and gains improvements from iterative refinements. Our approach generates more accurate, relevant, and comprehensive responses, taking a step towards more doctor-like medical reasoning systems.
comment: 32 pages, 5 figures, 5 tables
VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Large language models (LLMs) have shown remarkable performance across diverse reasoning and generation tasks, and are increasingly deployed as agents in dynamic environments such as code generation and recommendation systems. However, many real-world applications, such as high-frequency trading and real-time competitive gaming, require decisions under strict latency constraints, where faster responses directly translate into higher rewards. Despite the importance of this latency quality trade off, it remains underexplored in the context of LLM based agents. In this work, we present the first systematic study of this trade off in real time decision making tasks. To support our investigation, we introduce two new benchmarks: HFTBench, a high frequency trading simulation, and StreetFighter, a competitive gaming platform. Our analysis reveals that optimal latency quality balance varies by task, and that sacrificing quality for lower latency can significantly enhance downstream performance. To address this, we propose FPX, an adaptive framework that dynamically selects model size and quantization level based on real time demands. Our method achieves the best performance on both benchmarks, improving win rate by up to 80% in Street Fighter and boosting daily yield by up to 26.52% in trading, underscoring the need for latency aware evaluation and deployment strategies for LLM based agents. These results demonstrate the critical importance of latency aware evaluation and deployment strategies for real world LLM based agents. Our benchmarks are available at Latency Sensitive Benchmarks.
The challenge of hidden gifts in multi-agent reinforcement learning
Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These "hidden gifts" represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a "hidden gift". We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of "hidden gifts", and demonstrate that learning awareness in independent agents can benefit these settings.
xChemAgents: Agentic AI for Explainable Quantum Chemistry ICML 2025
Recent progress in multimodal graph neural networks has demonstrated that augmenting atomic XYZ geometries with textual chemical descriptors can enhance predictive accuracy across a range of electronic and thermodynamic properties. However, naively appending large sets of heterogeneous descriptors often degrades performance on tasks sensitive to molecular shape or symmetry, and undermines interpretability. xChemAgents proposes a cooperative agent framework that injects physics-aware reasoning into multimodal property prediction. xChemAgents comprises two language-model-based agents: a Selector, which adaptively identifies a sparse, weighted subset of descriptors relevant to each target, and provides a natural language rationale; and a Validator, which enforces physical constraints such as unit consistency and scaling laws through iterative dialogue. On standard benchmark datasets, xChemAgents achieves up to a 22\% reduction in mean absolute error over strong baselines, while producing faithful, human-interpretable explanations. Experiment results highlight the potential of cooperative, self-verifying agents to enhance both accuracy and transparency in foundation-model-driven materials science. The implementation and accompanying dataset are available anonymously at https://github.com/KurbanIntelligenceLab/xChemAgents.
comment: Submitted to ICML 2025 Workshop on MAS
Reconceptualizing Smart Microscopy: From Data Collection to Knowledge Creation by Multi-Agent Integration
Smart microscopy represents a paradigm shift in biological imaging, moving from passive observation tools to active collaborators in scientific inquiry. Enabled by advances in automation, computational power, and artificial intelligence, these systems are now capable of adaptive decision-making and real-time experimental control. Here, we introduce a theoretical framework that reconceptualizes smart microscopy as a partner in scientific investigation. Central to our framework is the concept of the 'epistemic-empirical divide' in cellular investigation-the gap between what is observable (empirical domain) and what must be understood (epistemic domain). We propose six core design principles: epistemic-empirical awareness, hierarchical context integration, an evolution from detection to perception, adaptive measurement frameworks, narrative synthesis capabilities, and cross-contextual reasoning. Together, these principles guide a multi-agent architecture designed to align empirical observation with the goals of scientific understanding. Our framework provides a roadmap for building microscopy systems that go beyond automation to actively support hypothesis generation, insight discovery, and theory development, redefining the role of scientific instruments in the process of knowledge creation.
comment: 34 pages, 5 figures
Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework
In cloud-native systems, Kubernetes clusters with interdependent services often face challenges to their operational resilience due to poor workload management issues such as resource blocking, bottlenecks, or continuous pod crashes. These vulnerabilities are further amplified in adversarial scenarios, such as Distributed Denial-of-Service attacks (DDoS). Conventional Horizontal Pod Autoscaling (HPA) approaches struggle to address such dynamic conditions, while reinforcement learning-based methods, though more adaptable, typically optimize single goals like latency or resource usage, neglecting broader failure scenarios. We propose decomposing the overarching goal of maintaining operational resilience into failure-specific sub-goals delegated to collaborative agents, collectively forming an HPA Multi-Agent System (MAS). We introduce an automated, four-phase online framework for HPA MAS design: 1) modeling a digital twin built from cluster traces; 2) training agents in simulation using roles and missions tailored to failure contexts; 3) analyzing agent behaviors for explainability; and 4) transferring learned policies to the real cluster. Experimental results demonstrate that the generated HPA MASs outperform three state-of-the-art HPA systems in sustaining operational resilience under various adversarial conditions in a proposed complex cluster.
Federated Domain Generalization with Data-free On-server Matching Gradient ICLR
Domain Generalization (DG) aims to learn from multiple known source domains a model that can generalize well to unknown target domains. One of the key approaches in DG is training an encoder which generates domain-invariant representations. However, this approach is not applicable in Federated Domain Generalization (FDG), where data from various domains are distributed across different clients. In this paper, we introduce a novel approach, dubbed Federated Learning via On-server Matching Gradient (FedOMG), which can \emph{efficiently leverage domain information from distributed domains}. Specifically, we utilize the local gradients as information about the distributed models to find an invariant gradient direction across all domains through gradient inner product maximization. The advantages are two-fold: 1) FedOMG can aggregate the characteristics of distributed models on the centralized server without incurring any additional communication cost, and 2) FedOMG is orthogonal to many existing FL/FDG methods, allowing for additional performance improvements by being seamlessly integrated with them. Extensive experimental evaluations on various settings to demonstrate the robustness of FedOMG compared to other FL/FDG baselines. Our method outperforms recent SOTA baselines on four FL benchmark datasets (MNIST, EMNIST, CIFAR-10, and CIFAR-100), and three FDG benchmark datasets (PACS, VLCS, and OfficeHome).
comment: 26 pages, 15 figures, ICLR
Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning
Semantic communication transmits the extracted features of information rather than raw data, significantly reducing redundancy, which is crucial for addressing spectrum and energy challenges in 6G networks. In this paper, we introduce semantic communication into a cellular vehicle-to-everything (C-V2X)- based autonomous vehicle platoon system for the first time, aiming to achieve efficient management of communication resources in a dynamic environment. Firstly, we construct a mathematical model for semantic communication in platoon systems, in which the DeepSC model and MU-DeepSC model are used to semantically encode and decode unimodal and multi-modal data, respectively. Then, we propose the quality of experience (QoE) metric based on semantic similarity and semantic rate. Meanwhile, we consider the success rate of semantic information transmission (SRS) metric to ensure the fairness of channel resource allocation. Next, the optimization problem is posed with the aim of maximizing the QoE in vehicle-to-vehicle (V2V) links while improving SRS. To solve this mixed integer nonlinear programming problem (MINLP) and adapt to time-varying channel conditions, the paper proposes a distributed semantic-aware multi-modal resource allocation (SAMRA) algorithm based on multi-agent reinforcement learning (MARL), referred to as SAMRAMARL. The algorithm can dynamically allocate channels and power and determine semantic symbol length based on the contextual importance of the transmitted information, ensuring efficient resource utilization. Finally, extensive simulations have demonstrated that SAMRAMARL outperforms existing methods, achieving significant gains in QoE, SRS, and communication delay in C-V2X platooning scenarios.
comment: This paper has been submitted to IEEE Journal. The source code has been released at:https://github.com/qiongwu86/Semantic-Aware-Resource-Management-for-C-V2X-Platooning-via-Multi-Agent-Reinforcement-Learning
Fast and Robust Flocking of Protesters on Street Networks
We propose a simple model of protesters scattered throughout a city who want to gather into large and mobile groups. This model relies on random walkers on a street network that follow tactics built from a set of basic rules. Our goal is to identify the most important rules for fast and robust flocking of walkers. We explore a wide set of tactics and show the central importance of a specific rule based on alignment. Other rules alone perform poorly, but our experiments show that combining alignment with them enhances flocking, and that obtained groups are then remarkably robust.
Sable: a Performant, Efficient and Scalable Sequence Model for MARL
As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modeling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage.
Systems and Control (CS)
Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks
Time series forecasting plays a critical role in domains such as energy, finance, and healthcare, where accurate predictions inform decision-making under uncertainty. Although Transformer-based models have demonstrated success in sequential modeling, their adoption for time series remains limited by challenges such as noise sensitivity, long-range dependencies, and a lack of inductive bias for temporal structure. In this work, we present a unified and principled framework for benchmarking three prominent Transformer forecasting architectures-Autoformer, Informer, and Patchtst-each evaluated through three architectural variants: Minimal, Standard, and Full, representing increasing levels of complexity and modeling capacity. We conduct over 1500 controlled experiments on a suite of ten synthetic signals, spanning five patch lengths and five forecast horizons under both clean and noisy conditions. Our analysis reveals consistent patterns across model families. To advance this landscape further, we introduce the Koopman-enhanced Transformer framework, Deep Koopformer, which integrates operator-theoretic latent state modeling to improve stability and interpretability. We demonstrate its efficacy on nonlinear and chaotic dynamical systems. Our results highlight Koopman based Transformer as a promising hybrid approach for robust, interpretable, and theoretically grounded time series forecasting in noisy and complex real-world conditions.
Efficient Gaussian Mixture Filters based on Transition Density Approximation
Gaussian mixture filters for nonlinear systems usually rely on severe approximations when calculating mixtures in the prediction and filtering step. Thus, offline approximations of noise densities by Gaussian mixture densities to reduce the approximation error have been proposed. This results in exponential growth in the number of components, requiring ongoing component reduction, which is computationally complex. In this paper, the key idea is to approximate the true transition density by an axis-aligned Gaussian mixture, where two different approaches are derived. These approximations automatically ensure a constant number of components in the posterior densities without the need for explicit reduction. In addition, they allow a trade-off between estimation quality and computational complexity.
comment: Submitted to the conference FUSION 2025
A Cooperative Aerial System of A Payload Drone Equipped with Dexterous Rappelling End Droid for Cluttered Space Pickup
In cluttered spaces, such as forests, drone picking up a payload via an abseil claw is an open challenge, as the cable is likely tangled and blocked by the branches and obstacles. To address such a challenge, in this work, a cooperative aerial system is proposed, which consists of a payload drone and a dexterous rappelling end droid. The two ends are linked via a Kevlar tether cable. The end droid is actuated by four propellers, which enable mid-air dexterous adjustment of clawing angle and guidance of cable movement. To avoid tanglement and rappelling obstacles, a trajectory optimization method that integrates cable length constraints and dynamic feasibility is developed, which guarantees safe pickup. A tether cable dynamic model is established to evaluate real-time cable status, considering both taut and sagging conditions. Simulation and real-world experiments are conducted to demonstrate that the proposed system is capable of picking up payload in cluttered spaces. As a result, the end droid can reach the target point successfully under cable constraints and achieve passive retrieval during the lifting phase without propulsion, which enables effective and efficient aerial manipulation.
comment: Video: https://youtu.be/dKrmzPdnblY
Interpretable Augmented Physics-Based Model for Estimation and Tracking
State-space estimation and tracking rely on accurate dynamical models to perform well. However, obtaining an vaccurate dynamical model for complex scenarios or adapting to changes in the system poses challenges to the estimation process. Recently, augmented physics-based models (APBMs) appear as an appealing strategy to cope with these challenges where the composition of a small and adaptive neural network with known physics-based models (PBM) is learned on the fly following an augmented state-space estimation approach. A major issue when introducing data-driven components in such a scenario is the danger of compromising the meaning (or interpretability) of estimated states. In this work, we propose a novel constrained estimation strategy that constrains the APBM dynamics close to the PBM. The novel state-space constrained approach leads to more flexible ways to impose constraints than the traditional APBM approach. Our experiments with a radar-tracking scenario demonstrate different aspects of the proposed approach and the trade-offs inherent in the imposed constraints.
comment: Submitted to the FUSION 2025 conference
Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees
Open-weight LLM zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of 2x cost savings compared to existing LLM routing techniques.
comment: Preprint. Under review
Signed Angle Rigid Graphs for Network Localization and Formation Control
Graph rigidity theory studies the capability of a graph embedded in the Euclidean space to constrain its global geometric shape via local constraints among nodes and edges, and has been widely exploited in network localization and formation control. In recent years, the traditional rigidity theory has been extended by considering new types of local constraints such as bearing, angle, ratio of distance, etc. Among them, the signed angle constraint received extensive attention since it is practically measurable and independent of the global coordinate frame. However, existing studies on signed angle rigidity always consider special graph structures, which are actually not necessary. This paper presents a comprehensive combinatorial analysis in terms of graphs and angle index sets for signed angle rigidity. We show that Laman graphs equivalently characterize minimally signed angle rigid graphs. Moreover, we propose a method to construct the minimal set of signed angle constraints in a Laman graph to effectively ensure signed angle rigidity. These results are finally applied to distributed network localization and formation stabilization problems, respectively, where each agent only has access to signed angle measurements.
Persistently Exciting Online Feedback Optimization Controller with Minimal Perturbations
This paper develops a persistently exciting input generating Online Feedback Optimization (OFO) controller that estimates the sensitivity of a process ensuring minimal deviations from the descent direction while converging. This eliminates the need for random perturbations in feedback loop. The proposed controller is formulated as a bilevel optimization program, where a nonconvex full rank constraint is relaxed using linear constraints and penalization. The validation of the method is performed in a simulated scenario where multiple systems share a limited, costly resource for production optimization, simulating an oil and gas resource allocation problem. The method allows for less input perturbations while accurately estimating gradients, allowing faster convergence when the gradients are unknown. In the case study, the proposed method achieved the same profit compared to an OFO controller with random input perturbations, and $1.4\%$ higher profit compared to an OFO controller without input perturbations.
comment: Accepted to CCTA 2025
Optimizing Offshore Wind Integration through Multi-Terminal DC Grids: A Market-Based OPF Framework for the North Sea Interconnectors
Interconnecting price zones and remote renewable energy sources has emerged as a key solution to achieving climate goals. The objective of this work is to present a formulation that extends the base optimal power flow model with price zones constraints to forecast the operations of upcoming offshore wind developments integrated into a multi-terminal DC grid. A case study based on the 2030 development of the North Sea is used to exemplify the utilization of the formulation. Here, three cases are presented, one with the price as a parameter and the other two with the price as a variable dependent on power flows between price zones. The paper demonstrates that, for large power flows, it is necessary to include additional constraints beyond line limitations to accurately capture the effects of price zone exchanges.
comment: 6 pages, ACDC Global 2025 The 22nd IET International Conference on AC and DC Power Transmission
A fully automated urban PV parameterization framework for improved estimation of energy production profiles
Accurate parameterization of rooftop photovoltaic (PV) installations is critical for effective grid management and strategic large-scale solar deployment. The lack of high-fidelity datasets for PV configuration parameters often compels practitioners to rely on coarse assumptions, undermining both the temporal and numerical accuracy of large-scale PV performance modeling. This study introduces a fully automated framework that innovatively integrates remote sensing data, semantic segmentation, polygon-vector refinement, tilt-azimuth estimation, and module layout inference to produce a richly attributed GIS dataset of distributed PV. Applied to Eindhoven (the Netherlands), the method achieves a correlation ($R^2$) of 0.92 with Distribution System Operator (DSO) records, while capacity estimates for 73$\%$ of neighborhoods demonstrate agreement within a $\pm$25$\%$ margin of recorded data. Additionally, by accurately capturing actual system configuration parameters (e.g., tilt, azimuth, module layout) and seamlessly linking them to advanced performance models, the method yields more reliable PV energy generation forecasts within the distribution networks. Centering our experiments toward a high PV-penetration community, configuration-aware simulations help to reduce Mean Absolute Percentage Error (MAPE) of energy generation modeling by up to 160$\%$ compared to the conventional assumption-based approaches. Furthermore, owing to its modular design and reliance on readily available geospatial resources, the workflow can be extended across diverse regions, offering a scalable solution for robust urban solar integration.
comment: This manuscript has been submitted to Solar Energy for peer review. 42 pages, 15 figures
Chance-constrained Solar PV Hosting Capacity Assessment for Distribution Grids Using Gaussian Process and Logit Learning
Growing penetration of distributed generation such as solar PV can increase the risk of over-voltage in distribution grids, affecting network security. Therefore, assessment of the so-called, PV hosting capacity (HC) - the maximum amount of PV that a given grid can accommodate becomes an important practical problem. In this paper, we propose a novel chance-constrained HC estimation framework using Gaussian Process and Logit learning that can account for uncertainty and risk management. Also, we consider the assessment of HC under different voltage control strategies. Our results have demonstrated that the proposed models can achieve high accuracy levels of up to 93% in predicting nodal over-voltage events on IEEE 33-bus and 123-bus test-cases. Thus, these models can be effectively employed to estimate the chance-constrained HC with various risk levels. Moreover, our proposed methods have simple forms and low computational costs of only a few seconds.
Scalable quantile predictions of peak loads for non-residential customer segments
Electrical grid congestion has emerged as an immense challenge in Europe, making the forecasting of load and its associated metrics increasingly crucial. Among these metrics, peak load is fundamental. Non-time-resolved models of peak load have their advantages of being simple and compact, and among them Velander's formula (VF) is widely used in distribution network planning. However, several aspects of VF remain inadequately addressed, including year-ahead prediction, scaling of customers, aggregation, and, most importantly, the lack of probabilistic elements. The present paper proposes a quantile interpretation of VF that enables VF to learn truncated cumulative distribution functions of peak loads with multiple quantile regression under non-crossing constraints. The evaluations on non-residential customer data confirmed its ability to predict peak load year ahead, to fit customers with a wide range of electricity consumptions, and to model aggregations of customers. A noteworthy finding is that for a given electricity consumption, aggregations of customers have statistically larger peak loads than a single customer.
comment: submitted to IEEE PES ISGT (Innovative Smart Grid Technologies) Europe 2025 conference
Range Space or Null Space: Least-Squares Methods for the Realization Problem
This contribution revisits the classical approximate realization problem, which involves determining matrices of a state-space model based on estimates of a truncated series of Markov parameters. A Hankel matrix built up by these Markov parameters plays a fundamental role in this problem, leveraging the fact that both its range space and left null space encode critical information about the state-space model. We examine two prototype realization algorithms based on the Hankel matrix: the classical range-space-based (SVD-based) method and the more recent null-space-based method. It is demonstrated that the range-space-based method corresponds to a total least-squares solution, whereas the null-space-based method corresponds to an ordinary least-squares solution. By analyzing the differences in sensitivity of the two algorithms, we determine the conditions when one or the other realization algorithm is to be preferred, and identify factors that contribute to an ill-conditioned realization problem. Furthermore, recognizing that both methods are suboptimal, we argue that the optimal realization is obtained through a weighted least-squares approach. A statistical analysis of these methods, including their consistency and asymptotic normality is also provided.
Customising Electricity Contracts at Scale with Large Language Models
The electricity system becomes more complex, connecting massive numbers of end-users and distributed generators. Adding or removing grid connections requires expert studies to align technical constraints with user requests. In times of labour shortages, carrying out these studies represents a significant amount of time that engineers at system operators spend in planning departments. As time is limited, only standard block connectivity contracts can be offered to end-users, or the requests pile up. Even if offers are made, these often do not perfectly match the user's requirements, leading to overpaying or underusing the grid capacity. This paper investigates whether end-users can negotiate individual, flexible time-of-use contracts directly with the grid using Large Language Models (LLM) in chats at scale. The LLM-based chat has direct access to a model of the grid and studies the grid's technical constraints just as an expert engineer. The advantage of this system is that end-users can directly interact with grid models through natural language; no intermediate is needed to service, analyse, study, assess, advise, consult and engineer. This initial study paves the way toward developing this tailored LLM system, resulting in possible high-efficiency gains for grid planning and customer management.
Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures ICML 2025
Learning unknown dynamics under environmental (or external) constraints is fundamental to many fields (e.g., modern robotics), particularly challenging when constraint information is only locally available and uncertain. Existing approaches requiring global constraints or using probabilistic filtering fail to fully exploit the geometric structure inherent in local measurements (by using, e.g., sensors) and constraints. This paper presents a geometric framework unifying measurements, constraints, and dynamics learning through a fiber bundle structure over the state space. This naturally induced geometric structure enables measurement-aware Control Barrier Functions that adapt to local sensing (or measurement) conditions. By integrating Neural ODEs, our framework learns continuous-time dynamics while preserving geometric constraints, with theoretical guarantees of learning convergence and constraint satisfaction dependent on sensing quality. The geometric framework not only enables efficient dynamics learning but also suggests promising directions for integration with reinforcement learning approaches. Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions.
comment: Accepted by ICML 2025
Synchronous Models and Fundamental Systems in Observer Design
This paper introduces the concept of a synchronous model as an extension of the internal model concept used in observer design for dynamical systems. A system is said to contain a synchronous model of another if there is a suitable error function between the two systems that remains stationary for all of the trajectories of the two systems. A system is said to admit a synchronous lift if a second system containing a synchronous model exists. We provide necessary and sufficient conditions that a system admits a synchronous lift and provide a method to construct a (there may be many) lifted system should one exist. We characterise the class of all systems that admit a synchronous lift by showing that they consist of fundamental vector fields induced by a Lie group action, a class of system we term fundamental systems. For fundamental systems we propose a simple synchronous observer design methodology, for which we show how correction terms can be discretised and combined easily, facilitating global characterisation of convergence and performance. Finally, we provide three examples to demonstrate the key concepts of synchrony, symmetry construction, and observer design for a fundamental system.
comment: 19 pages, 3 figures, submitted to IEEE TAC
LLA-MPC: Fast Adaptive Control for Autonomous Racing
We present Look-Back and Look-Ahead Adaptive Model Predictive Control (LLA-MPC), a real-time adaptive control framework for autonomous racing that addresses the challenge of rapidly changing tire-surface interactions. Unlike existing approaches requiring substantial data collection or offline training, LLA-MPC employs a model bank for immediate adaptation without a learning period. It integrates two key mechanisms: a look-back window that evaluates recent vehicle behavior to select the most accurate model and a look-ahead horizon that optimizes trajectory planning based on the identified dynamics. The selected model and estimated friction coefficient are then incorporated into a trajectory planner to optimize reference paths in real-time. Experiments across diverse racing scenarios demonstrate that LLA-MPC outperforms state-of-the-art methods in adaptation speed and handling, even during sudden friction transitions. Its learning-free, computationally efficient design enables rapid adaptation, making it ideal for high-speed autonomous racing in multi-surface environments.
VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Direct Pseudospectral Optimal Control by Orthogonal Polynomial Integral Collocation
This paper details a methodology to transcribe an optimal control problem into a nonlinear program for generation of the trajectories that optimize a given functional by approximating only the highest order derivatives of a given system's dynamics. The underlying method uses orthogonal polynomial integral collocation by which successive integrals are taken to approximate all lower order states. Hence, one set of polynomial coefficients can represent an entire coordinate's degree of freedom. Specifically, Chebyshev polynomials of the first and second kind and Legendre polynomials are used over their associated common interpolating grids derived from the bases' roots and extrema. Simple example problems compare different polynomial bases' performance to analytical solutions. The planar circular orbit raising problem is used to verify the method with solutions obtained by other pseudospectral methods in literature. Finally, a rocket landing flip maneuver problem is solved to demonstrate the ability to solve complex problems with multiple states and control variables with constraints. Simulations establish this method's performance, and reveal that the polynomial/node choice for a given problem notably affects the performance.
comment: Preprint submitted to the Journal of Guidance, Control, and Dynamics for publication
Split-as-a-Pro: behavioral control via operator splitting and alternating projections
The paper introduces Split-as-a-Pro, a control framework that integrates behavioral systems theory, operator splitting methods, and alternating projection algorithms. The framework reduces dynamic optimization problems - arising in both control and estimation - to efficient projection computations. Split-as-a-Pro builds on a non-parametric formulation that exploits system structure to separate dynamic constraints imposed by individual subsystems from external ones, such as interconnection constraints and input/output constraints. This enables the use of arbitrary system representations, as long as the associated projection is efficiently computable, thereby enhancing scalability and compatibility with gray-box modeling. We demonstrate the effectiveness of Split-as-a-Pro by developing a distributed algorithm for solving finite-horizon linear quadratic control problems and illustrate its use in predictive control. Our numerical case studies show that algorithms obtained using Split-as-a-Pro significantly outperform their centralized counterparts in runtime and scalability across various standard graph topologies, while seamlessly leveraging both model-based and data-driven system representations.
Alignment of large language models with constrained learning
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem, where the goal is to maximize a primary reward objective while satisfying constraints on secondary utilities. Despite the popularity of Lagrangian-based LLM policy search in constrained alignment, iterative primal-dual methods often fail to converge, and non-iterative dual-based methods do not achieve optimality in the LLM parameter space. To address these challenges, we employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the LLM policy via Lagrangian maximization and updating the dual variable via dual descent. In theory, we characterize the primal-dual gap between the primal value in the distribution space and the dual value in the LLM parameter space. We further quantify the optimality gap of the learned LLM policies at near-optimal dual variables with respect to both the objective and the constraint functions. These results prove that dual-based alignment methods can find an optimal constrained LLM policy, up to an LLM parametrization gap. We demonstrate the effectiveness and merits of our approach through extensive experiments conducted on the PKU-SafeRLHF dataset.
comment: 48 pages, 7 figures, 7 tables
Privacy-Preserving Peer-to-Peer Energy Trading via Hybrid Secure Computations
The massive integration of uncertain distributed renewable energy resources into power systems raises power imbalance concerns. Peer-to-peer (P2P) energy trading provides a promising way to balance the prosumers' volatile energy power generation and demands locally. Particularly, to protect the privacy of prosumers, distributed P2P energy trading is broadly advocated. However, severe privacy leakage issues can emerge in the realistic fully distributed P2P energy trading paradigm. Meanwhile, in this paradigm, two-party and multi-party computations coexist, challenging the naive privacy-preserving techniques. To tackle privacy leakage issues arising from the fully distributed P2P energy trading, this paper proposes a privacy-preserving approach via hybrid secure computations. A secure multi-party computation mechanism consisting of offline and online phases is developed to ensure the security of shared data by leveraging the tailored secret sharing method. In addition, the Paillier encryption method based on the Chinese Remainder Theorem is proposed for both the secure two-party computation and the offline phase of the multi-party computation. The random encryption coefficient is designed to enhance the security of the two-party computation and simultaneously guarantee the convergence of the distributed optimization. The feasible range for the encryption coefficient is derived with a strict mathematical proof. Numerical simulations demonstrate the exactness, effectiveness, and scalability of the proposed privacy-preserving approach.
Synergising Hierarchical Data Centers and Power Networks: A Privacy-Preserving Approach
In the era of digitization, data centers have emerged as integral contributors sustaining our interlinked world, bearing responsibility for an increasing proportion of the world's energy consumption. To facilitate the their fast rollout while progressing towards net-zero energy systems, the synergy of hierarchical data centers (cloud-fog-edge) and power networks can play a pivotal role. However, existing centralized co-dispatch manners encroach on the privacy of different agents within the integrated systems, meanwhile suffering from the combinatorial explosion. In this research, we propose a near-optimal distributed privacy-preserving approach to solve the non-convex synergy (day-ahead co-dispatch) problem. The synergy problem is formulated as a mixed integer quadratically constrained quadratic programming considering both communication and energy conservation, where Lyapunov optimization is introduced to balance operating costs and uncertain communication delays. To mitigate impacts of the highly non-convex nature, the normalized multi-parametric disaggregation technique is leveraged to reformulate the problem into a mixed integer non-linear programming. To further overcome non-smoothness of the reformulated problem, the customized $\ell_1-$surrogate Lagrangian relaxation method with convergence guarantees is proposed to solve the problem in a distributed privacy-preserving manner. {The effectiveness, optimality, and scalability of the proposed methodologies for the synergy problem are validated via numerical simulations. Simulation results also indicate that computing tasks can be delayed and migrated within the hierarchical data centers, demonstrating the flexible resource allocation capabilities of the hierarchical data center architecture, further facilitating peak load balancing in the power network.
Byzantine-Resilient Distributed P2P Energy Trading via Spatial-Temporal Anomaly Detection
Distributed peer-to-peer (P2P) energy trading mandates an escalating coupling between the physical power network and communication network, necessitating high-frequency sharing of real-time data among prosumers. However, this data-sharing scheme renders the system vulnerable to various malicious behaviors, as Byzantine agents can initiate cyberattacks by injecting sophisticated false data. To better investigate the impacts of malicious Byzantine faults, this paper develops a fully distributed P2P energy trading model by accounting for the high-fidelity physical network constraints. To further detect Byzantine faults and mitigate their impacts on distributed P2P energy trading problem, we propose an online spatial-temporal anomaly detection approach by leveraging the tensor learning method, which is informed by the domain knowledge to enable awesome detection performance. Moreover, to enhance its computational efficiency, we further develop closed-form solutions for the proposed detection approach. Subsequently, we derive theoretical conditions for guaranteeing optimality and convergence of the distributed P2P energy trading problem with anomaly detection mechanisms. Results from numerical simulations validate the effectiveness, optimality, and scalability of the proposed approach.
Robot Operation of Home Appliances by Reading User Manuals
Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) ground the policies to the appliance in the physical world, and (iii) execute the policies reliably over potentially many steps, despite compounding errors. To tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It grounds the symbolic actions visually to control panel elements. Finally, ApBot closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success rate, compared with state-of-the-art large VLMs used directly as control policies. These results suggest that a structured internal representations plays an important role in robust robot operation of home appliances, especially, complex ones.
Algorithmic Control Improves Residential Building Energy and EV Management when PV Capacity is High but Battery Capacity is Low
Efficient energy management in prosumer households is key to alleviating grid stress in an energy transition marked by electric vehicles (EV), renewable energies and battery storage. However, it is unclear how households optimize prosumer EV charging. Here we study real-world data from 90 households on fixed-rate electricity tariffs in German-speaking countries to investigate the potential of Deep Reinforcement Learning (DRL) and other control approaches (Rule-Based, Model Predictive Control) to manage the dynamic and uncertain environment of Home Energy Management (HEM) and optimize household charging patterns. The DRL agent efficiently aligns charging of EV and battery storage with photovoltaic (PV) surplus. We find that frequent EV charging transactions, early EV connections and PV surplus increase optimization potential. A detailed analysis of nine households (1 hour resolution, 1 year) demonstrates that high battery capacity facilitates self optimization; in this case further algorithmic control shows little value. In cases with relatively low battery capacity, algorithmic control with DRL improves energy management and cost savings by a relevant margin. This result is further corroborated by our simulation of a synthetic household. We conclude that prosumer households with optimization potential would profit from DRL, thus benefiting also the full electricity system and its decarbonization.
Numerical estimation of the lock-in domain of a DC/AC inverter
We estimate the lock-in domain of the origin of a current control system which is used in common DC/AC inverter designs. The system is a cascade connection of a 4-dimensional linear system (current controller, CC) followed by a two-dimensional nonlinear system (phase-locked loop, PLL). For the PLL, we construct a Lyapunov function via numerical approximation of its level curves. In combination with the quadratic Lyapunov function of the CC, it forms a vector Lyapunov function (VLF) for the overall system. A forward-invariant set of the VLF is found via numerical application of the comparison principle. By LaSalle's invariance principle, convergence to the origin is established.
comment: Submitted to the IFAC Symposium on Nonlinear Control Systems (NOLCOS) 2025
Stability of two-dimensional SISO LTI system with bounded feedback gain that has bounded derivative
We consider a two-dimensional SISO LTI system closed by uncertain linear feedback. The feedback gain is time-varying, bounded, and has a bounded derivative (both bounds are known). We investigate the asymptotic stability of this system under all admissible behaviors of the gain. Note that the situation is similar to the classical absolute stability problem of Lurie--Aizerman with two differences: linearity and derivative constraint. Our method of analysis is therefore inspired by the variational ideas of Pyatnitskii, Barabanov, Margaliot, and others developed for the absolute stability problem. We derive the Hamilton--Jacobi--Bellman equation for a function describing the "most unstable" of the possible portraits of the closed-loop system. A numerical method is proposed for solving the equation. Based on the solution, sufficient conditions are formulated for the asymptotic stability and instability. The method is applied to an equation arising from the analysis of a power electronics synchronization circuit.
comment: Submitted to the European Control Conference (ECC) 2025
Learning mechanical systems from real-world data using discrete forced Lagrangian dynamics
We introduce a data-driven method for learning the equations of motion of mechanical systems directly from position measurements, without requiring access to velocity data. This is particularly relevant in system identification tasks where only positional information is available, such as motion capture, pixel data or low-resolution tracking. Our approach takes advantage of the discrete Lagrange-d'Alembert principle and the forced discrete Euler-Lagrange equations to construct a physically grounded model of the system's dynamics. We decompose the dynamics into conservative and non-conservative components, which are learned separately using feed-forward neural networks. In the absence of external forces, our method reduces to a variational discretization of the action principle naturally preserving the symplectic structure of the underlying Hamiltonian system. We validate our approach on a variety of synthetic and real-world datasets, demonstrating its effectiveness compared to baseline methods. In particular, we apply our model to (1) measured human motion data and (2) latent embeddings obtained via an autoencoder trained on image sequences. We demonstrate that we can faithfully reconstruct and separate both the conservative and forced dynamics, yielding interpretable and physically consistent predictions.
(Un)supervised Learning of Maximal Lyapunov Functions
In this paper, we address the problem of discovering maximal Lyapunov functions, as a means of determining the region of attraction of a dynamical system. To this end, we design a novel neural network architecture, which we prove to be a universal approximator of (maximal) Lyapunov functions. The architecture combines a local quadratic approximation with the output of a neural network, which models global higher-order terms in the Taylor expansion. We formulate the problem of training the Lyapunov function as an unsupervised optimization problem with dynamical constraints, which can be solved leveraging techniques from physics-informed learning. We propose and analyze a tailored training algorithm, based on the primal-dual algorithm, that can efficiently solve the problem. Additionally, we show how the learning problem formulation can be adapted to integrate data, when available. We apply the proposed approach to different classes of systems, showing that it matches or outperforms state-of-the-art alternatives in the accuracy of the approximated regions of attraction.
Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity
We study the problem of estimating the optimal Q-function of $\gamma$-discounted Markov decision processes (MDPs) under the synchronous setting, where independent samples for all state-action pairs are drawn from a generative model at each iteration. We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ). VRCQ comprises two key building blocks: (i) the established direct variance reduction technique and (ii) our proposed variance reduction scheme, Cascade Q-learning. By leveraging these techniques, VRCQ provides superior guarantees in the $\ell_\infty$-norm compared with the existing model-free stochastic approximation-type algorithms. Specifically, we demonstrate that VRCQ is minimax optimal. Additionally, when the action set is a singleton (so that the Q-learning problem reduces to policy evaluation), it achieves non-asymptotic instance optimality while requiring the minimum number of samples theoretically possible. Our theoretical results and their practical implications are supported by numerical experiments.
comment: Update from v1: Proposition 1 has been revised. References have been updated
IAE Optimized PID Tuning via Second Order Step Response Target Matching
This paper presents SOSTIAE (Second-Order System Target IAE), a novel PID tuning method that combines IAE minimization with explicit transient response shaping for practical control applications. The algorithm generates optimal PID parameters by matching the closed-loop response to a target second-order system with user-defined settling time (Ts) and percent overshoot (PO), while maintaining the conventional IAE performance metric. Comparative evaluations on first to third-order systems demonstrate that SOSTIAE consistently outperforms MATLAB's proprietary pidtune function, achieving 47-67% lower overshoot and up to 26% better IAE performance for higher-order plants. The constrained optimization framework ensures physically realizable controllers by enforcing non-negative PID gains and stability criteria, addressing known limitations of unconstrained IAE methods. Results indicate that SOSTIAE provides engineers with a systematic alternative for PID tuning when transient specifications and practical implementation constraints are critical.
comment: 10 pages, 3 figures, 1 table
Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction
This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based neural network that generates high-resolution, multi-step predictions in a single operation. The prediction outcomes are further utilized to create geometric shapes formulated as mathematical constraints. Instead of treating each dynamic obstacle individually, predicted obstacles are grouped by proximity in an unsupervised way to improve performance and efficiency. The overall collision-free navigation is handled by model predictive control with a specific design for proactive dynamic obstacle avoidance. The proposed approach allows mobile robots to navigate effectively in dynamic environments. Its performance is accessed across various scenarios that represent typical warehouse settings. The results demonstrate that the proposed approach outperforms other existing dynamic obstacle avoidance methods.
comment: Accepted by IEEE RA-L
A Control-Oriented Simplified Single Particle Model with Grouped Parameter and Sensitivity Analysis for Lithium-Ion Batteries
Lithium-ion batteries are widely used in transportation, energy storage, and consumer electronics, driving the need for reliable battery management systems (BMS) for state estimation and control. The Single Particle Model (SPM) balances computational efficiency and accuracy but faces challenges in parameter estimation due to numerous parameters. Current SPM models using parabolic approximation introduce intermediate variables and hard to do parameter grouping. This study presents a control-oriented SPM reformulation that employs parameter grouping and parabolic approximation to simplify model parameters while using average and surface lithium-ion concentrations as model output. By parameter grouping, the original 17 parameters were reduced to 9 grouped parameters. The reformulated model achieves a reduced-order ordinary differential equation form while maintaining mathematical accuracy equivalent to the pre-grouped discretized SPM. Through Sobol sensitivity analysis under various current profiles, the grouped parameters were reduced from 9 to 6 highly sensitive parameters. Results demonstrate that estimating these 6 parameters achieves comparable practical accuracy to estimating all 9 parameters, with faster convergence. This control-oriented SPM enhances BMS applications by facilitating state estimation and control while reducing parameter estimation requirements.
comment: 30 pages, 4 figures
Convergence Analysis of Gradient Flow for Overparameterized LQR Formulations
Motivated by the growing use of artificial intelligence (AI) tools in control design, this paper analyses the intersection between results from gradient methods for the model-free linear quadratic regulator (LQR), and linear feedforward neural networks (LFFNNs), More specifically, it looks into the case where one wants to find a LFFNN feedback that minimizes a LQR cost. This paper starts by analyzing the structure of the gradient expression for the parameters of each layer, which implies a key conservation law of the system. This conservation law is then leveraged to generalize existing results on boundedness and global convergence of solutions to critical points, and invariance of the set of stabilizing networks under the training dynamics. This is followed by an analysis of the case where the LFFNN has a single hidden layer, for which the paper proves that the training converges not only to the critical points, but to the optimal feedback control law for all but a set of Lebesgue measure zero of the initializations. These theoretical results are followed by an extensive analysis of a simple version of the problem -- the ``vector case'' -- proving the theoretical properties of accelerated convergence and small-input input-to-state stability (ISS) for this simpler example. Finally, the paper presents numerical evidence of faster convergence of the training of general LFFNNs when compared to non-overparameterized formulations, showing that the acceleration of the solution is observable even when the gradient is not explicitly computed, but estimated from evaluations of the cost function.
Linearization of ReLU Activation Function for Neural Network-Embedded Optimization: Optimal Day-Ahead Energy Scheduling
Recently, neural networks have been widely applied in the power system area. They can be used for better predicting input information and modeling system performance with increased accuracy. In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models that enforce limits on the output of the same learning model. This will create a neural network-embedded optimization problem; the use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable. To address this emerging challenge, this paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function. Four linearization methods tailored for the ReLU activation function are developed, analyzed and compared in this paper. Each method employs a set of linear constraints to replace the ReLU function, effectively linearizing the optimization problem, which can overcome the computational challenges associated with the nonlinearity of the neural network model. These proposed linearization methods provide valuable tools for effectively solving optimization problems that integrate neural network models with ReLU activation functions
Computationally Efficient System Level Tube-MPC for Uncertain Systems
Tube-based model predictive control (MPC) is one of the principal robust control techniques for constrained linear systems affected by additive disturbances. While tube-based methods with online-computed tubes have been successfully applied to systems with additive disturbances, their application to systems affected by additional model uncertainties is challenging. This paper proposes a tube-based MPC method - named filter-based system level tube-MPC (SLTMPC) - which overapproximates both types of uncertainties with an online optimized disturbance set, while simultaneously computing the tube controller online. For the first time, we provide rigorous closed-loop guarantees for receding horizon control of such a MPC method. These guarantees are obtained by virtue of a new terminal controller design and an online optimized terminal set. To reduce the computational complexity of the proposed method, we additionally introduce an asynchronous computation scheme that separates the optimization of the tube controller and the nominal trajectory. Finally, we provide a comprehensive numerical evaluation of the proposed methods to demonstrate their effectiveness.
comment: 20 pages, 5 figures, Automatica 2025
Implicit predictors in regularized data-driven predictive control
We introduce the notion of implicit predictors, which characterize the input-(state)-output prediction behavior underlying a predictive control scheme, even if it is not explicitly enforced as an equality constraint (as in traditional model or subspace predictive control). To demonstrate this concept, we derive and analyze implicit predictors for some basic data-driven predictive control (DPC) schemes, which offers a new perspective on this popular approach that may form the basis for modified DPC schemes and further theoretical insights.
comment: This paper is a preprint of a contribution to the IEEE Control Systems Letters. 6 pages, 2 figures
Convex NMPC reformulations for a special class of nonlinear multi-input systems with application to rank-one bilinear networks
We show that a special class of (nonconvex) NMPC problems admits an exact solution by reformulating them as a finite number of convex subproblems, extending previous results to the multi-input case. Our approach is applicable to a special class of input-affine discrete-time systems, which includes a class of bilinear rank-one systems that is considered useful in modeling certain controlled networks. We illustrate our results with two numerical examples, including the aforementioned rank-one bilinear network.
comment: 8 pages, 4 Figures, submitted to 22nd World Congress of the International Federation of Automatic Control 2023
Distributed Safe Learning and Planning for Multi-robot Systems
This paper considers the problem of online multi-robot motion planning with general nonlinear dynamics subject to unknown external disturbances. We propose dSLAP, a distributed safe learning and planning framework that allows the robots to safely navigate through the environments by coupling online learning and motion planning. Gaussian process regression is used to online learn the disturbances with uncertainty quantification. The planning algorithm ensures collision avoidance against the learning uncertainty and utilizes set-valued analysis to achieve fast adaptation in response to the newly learned models. A set-valued model predictive control problem is then formulated and solved to return a control policy that balances between actively exploring the unknown disturbances and reaching goal regions. Sufficient conditions are established to guarantee the safety of the robots in the absence of backup policy. Monte Carlo simulations are conducted for evaluation.
Systems and Control (EESS)
Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks
Time series forecasting plays a critical role in domains such as energy, finance, and healthcare, where accurate predictions inform decision-making under uncertainty. Although Transformer-based models have demonstrated success in sequential modeling, their adoption for time series remains limited by challenges such as noise sensitivity, long-range dependencies, and a lack of inductive bias for temporal structure. In this work, we present a unified and principled framework for benchmarking three prominent Transformer forecasting architectures-Autoformer, Informer, and Patchtst-each evaluated through three architectural variants: Minimal, Standard, and Full, representing increasing levels of complexity and modeling capacity. We conduct over 1500 controlled experiments on a suite of ten synthetic signals, spanning five patch lengths and five forecast horizons under both clean and noisy conditions. Our analysis reveals consistent patterns across model families. To advance this landscape further, we introduce the Koopman-enhanced Transformer framework, Deep Koopformer, which integrates operator-theoretic latent state modeling to improve stability and interpretability. We demonstrate its efficacy on nonlinear and chaotic dynamical systems. Our results highlight Koopman based Transformer as a promising hybrid approach for robust, interpretable, and theoretically grounded time series forecasting in noisy and complex real-world conditions.
Efficient Gaussian Mixture Filters based on Transition Density Approximation
Gaussian mixture filters for nonlinear systems usually rely on severe approximations when calculating mixtures in the prediction and filtering step. Thus, offline approximations of noise densities by Gaussian mixture densities to reduce the approximation error have been proposed. This results in exponential growth in the number of components, requiring ongoing component reduction, which is computationally complex. In this paper, the key idea is to approximate the true transition density by an axis-aligned Gaussian mixture, where two different approaches are derived. These approximations automatically ensure a constant number of components in the posterior densities without the need for explicit reduction. In addition, they allow a trade-off between estimation quality and computational complexity.
comment: Submitted to the conference FUSION 2025
A Cooperative Aerial System of A Payload Drone Equipped with Dexterous Rappelling End Droid for Cluttered Space Pickup
In cluttered spaces, such as forests, drone picking up a payload via an abseil claw is an open challenge, as the cable is likely tangled and blocked by the branches and obstacles. To address such a challenge, in this work, a cooperative aerial system is proposed, which consists of a payload drone and a dexterous rappelling end droid. The two ends are linked via a Kevlar tether cable. The end droid is actuated by four propellers, which enable mid-air dexterous adjustment of clawing angle and guidance of cable movement. To avoid tanglement and rappelling obstacles, a trajectory optimization method that integrates cable length constraints and dynamic feasibility is developed, which guarantees safe pickup. A tether cable dynamic model is established to evaluate real-time cable status, considering both taut and sagging conditions. Simulation and real-world experiments are conducted to demonstrate that the proposed system is capable of picking up payload in cluttered spaces. As a result, the end droid can reach the target point successfully under cable constraints and achieve passive retrieval during the lifting phase without propulsion, which enables effective and efficient aerial manipulation.
comment: Video: https://youtu.be/dKrmzPdnblY
Interpretable Augmented Physics-Based Model for Estimation and Tracking
State-space estimation and tracking rely on accurate dynamical models to perform well. However, obtaining an vaccurate dynamical model for complex scenarios or adapting to changes in the system poses challenges to the estimation process. Recently, augmented physics-based models (APBMs) appear as an appealing strategy to cope with these challenges where the composition of a small and adaptive neural network with known physics-based models (PBM) is learned on the fly following an augmented state-space estimation approach. A major issue when introducing data-driven components in such a scenario is the danger of compromising the meaning (or interpretability) of estimated states. In this work, we propose a novel constrained estimation strategy that constrains the APBM dynamics close to the PBM. The novel state-space constrained approach leads to more flexible ways to impose constraints than the traditional APBM approach. Our experiments with a radar-tracking scenario demonstrate different aspects of the proposed approach and the trade-offs inherent in the imposed constraints.
comment: Submitted to the FUSION 2025 conference
Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees
Open-weight LLM zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of 2x cost savings compared to existing LLM routing techniques.
comment: Preprint. Under review
Signed Angle Rigid Graphs for Network Localization and Formation Control
Graph rigidity theory studies the capability of a graph embedded in the Euclidean space to constrain its global geometric shape via local constraints among nodes and edges, and has been widely exploited in network localization and formation control. In recent years, the traditional rigidity theory has been extended by considering new types of local constraints such as bearing, angle, ratio of distance, etc. Among them, the signed angle constraint received extensive attention since it is practically measurable and independent of the global coordinate frame. However, existing studies on signed angle rigidity always consider special graph structures, which are actually not necessary. This paper presents a comprehensive combinatorial analysis in terms of graphs and angle index sets for signed angle rigidity. We show that Laman graphs equivalently characterize minimally signed angle rigid graphs. Moreover, we propose a method to construct the minimal set of signed angle constraints in a Laman graph to effectively ensure signed angle rigidity. These results are finally applied to distributed network localization and formation stabilization problems, respectively, where each agent only has access to signed angle measurements.
Persistently Exciting Online Feedback Optimization Controller with Minimal Perturbations
This paper develops a persistently exciting input generating Online Feedback Optimization (OFO) controller that estimates the sensitivity of a process ensuring minimal deviations from the descent direction while converging. This eliminates the need for random perturbations in feedback loop. The proposed controller is formulated as a bilevel optimization program, where a nonconvex full rank constraint is relaxed using linear constraints and penalization. The validation of the method is performed in a simulated scenario where multiple systems share a limited, costly resource for production optimization, simulating an oil and gas resource allocation problem. The method allows for less input perturbations while accurately estimating gradients, allowing faster convergence when the gradients are unknown. In the case study, the proposed method achieved the same profit compared to an OFO controller with random input perturbations, and $1.4\%$ higher profit compared to an OFO controller without input perturbations.
comment: Accepted to CCTA 2025
Optimizing Offshore Wind Integration through Multi-Terminal DC Grids: A Market-Based OPF Framework for the North Sea Interconnectors
Interconnecting price zones and remote renewable energy sources has emerged as a key solution to achieving climate goals. The objective of this work is to present a formulation that extends the base optimal power flow model with price zones constraints to forecast the operations of upcoming offshore wind developments integrated into a multi-terminal DC grid. A case study based on the 2030 development of the North Sea is used to exemplify the utilization of the formulation. Here, three cases are presented, one with the price as a parameter and the other two with the price as a variable dependent on power flows between price zones. The paper demonstrates that, for large power flows, it is necessary to include additional constraints beyond line limitations to accurately capture the effects of price zone exchanges.
comment: 6 pages, ACDC Global 2025 The 22nd IET International Conference on AC and DC Power Transmission
A fully automated urban PV parameterization framework for improved estimation of energy production profiles
Accurate parameterization of rooftop photovoltaic (PV) installations is critical for effective grid management and strategic large-scale solar deployment. The lack of high-fidelity datasets for PV configuration parameters often compels practitioners to rely on coarse assumptions, undermining both the temporal and numerical accuracy of large-scale PV performance modeling. This study introduces a fully automated framework that innovatively integrates remote sensing data, semantic segmentation, polygon-vector refinement, tilt-azimuth estimation, and module layout inference to produce a richly attributed GIS dataset of distributed PV. Applied to Eindhoven (the Netherlands), the method achieves a correlation ($R^2$) of 0.92 with Distribution System Operator (DSO) records, while capacity estimates for 73$\%$ of neighborhoods demonstrate agreement within a $\pm$25$\%$ margin of recorded data. Additionally, by accurately capturing actual system configuration parameters (e.g., tilt, azimuth, module layout) and seamlessly linking them to advanced performance models, the method yields more reliable PV energy generation forecasts within the distribution networks. Centering our experiments toward a high PV-penetration community, configuration-aware simulations help to reduce Mean Absolute Percentage Error (MAPE) of energy generation modeling by up to 160$\%$ compared to the conventional assumption-based approaches. Furthermore, owing to its modular design and reliance on readily available geospatial resources, the workflow can be extended across diverse regions, offering a scalable solution for robust urban solar integration.
comment: This manuscript has been submitted to Solar Energy for peer review. 42 pages, 15 figures
Chance-constrained Solar PV Hosting Capacity Assessment for Distribution Grids Using Gaussian Process and Logit Learning
Growing penetration of distributed generation such as solar PV can increase the risk of over-voltage in distribution grids, affecting network security. Therefore, assessment of the so-called, PV hosting capacity (HC) - the maximum amount of PV that a given grid can accommodate becomes an important practical problem. In this paper, we propose a novel chance-constrained HC estimation framework using Gaussian Process and Logit learning that can account for uncertainty and risk management. Also, we consider the assessment of HC under different voltage control strategies. Our results have demonstrated that the proposed models can achieve high accuracy levels of up to 93% in predicting nodal over-voltage events on IEEE 33-bus and 123-bus test-cases. Thus, these models can be effectively employed to estimate the chance-constrained HC with various risk levels. Moreover, our proposed methods have simple forms and low computational costs of only a few seconds.
Scalable quantile predictions of peak loads for non-residential customer segments
Electrical grid congestion has emerged as an immense challenge in Europe, making the forecasting of load and its associated metrics increasingly crucial. Among these metrics, peak load is fundamental. Non-time-resolved models of peak load have their advantages of being simple and compact, and among them Velander's formula (VF) is widely used in distribution network planning. However, several aspects of VF remain inadequately addressed, including year-ahead prediction, scaling of customers, aggregation, and, most importantly, the lack of probabilistic elements. The present paper proposes a quantile interpretation of VF that enables VF to learn truncated cumulative distribution functions of peak loads with multiple quantile regression under non-crossing constraints. The evaluations on non-residential customer data confirmed its ability to predict peak load year ahead, to fit customers with a wide range of electricity consumptions, and to model aggregations of customers. A noteworthy finding is that for a given electricity consumption, aggregations of customers have statistically larger peak loads than a single customer.
comment: submitted to IEEE PES ISGT (Innovative Smart Grid Technologies) Europe 2025 conference
Range Space or Null Space: Least-Squares Methods for the Realization Problem
This contribution revisits the classical approximate realization problem, which involves determining matrices of a state-space model based on estimates of a truncated series of Markov parameters. A Hankel matrix built up by these Markov parameters plays a fundamental role in this problem, leveraging the fact that both its range space and left null space encode critical information about the state-space model. We examine two prototype realization algorithms based on the Hankel matrix: the classical range-space-based (SVD-based) method and the more recent null-space-based method. It is demonstrated that the range-space-based method corresponds to a total least-squares solution, whereas the null-space-based method corresponds to an ordinary least-squares solution. By analyzing the differences in sensitivity of the two algorithms, we determine the conditions when one or the other realization algorithm is to be preferred, and identify factors that contribute to an ill-conditioned realization problem. Furthermore, recognizing that both methods are suboptimal, we argue that the optimal realization is obtained through a weighted least-squares approach. A statistical analysis of these methods, including their consistency and asymptotic normality is also provided.
Customising Electricity Contracts at Scale with Large Language Models
The electricity system becomes more complex, connecting massive numbers of end-users and distributed generators. Adding or removing grid connections requires expert studies to align technical constraints with user requests. In times of labour shortages, carrying out these studies represents a significant amount of time that engineers at system operators spend in planning departments. As time is limited, only standard block connectivity contracts can be offered to end-users, or the requests pile up. Even if offers are made, these often do not perfectly match the user's requirements, leading to overpaying or underusing the grid capacity. This paper investigates whether end-users can negotiate individual, flexible time-of-use contracts directly with the grid using Large Language Models (LLM) in chats at scale. The LLM-based chat has direct access to a model of the grid and studies the grid's technical constraints just as an expert engineer. The advantage of this system is that end-users can directly interact with grid models through natural language; no intermediate is needed to service, analyse, study, assess, advise, consult and engineer. This initial study paves the way toward developing this tailored LLM system, resulting in possible high-efficiency gains for grid planning and customer management.
Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures ICML 2025
Learning unknown dynamics under environmental (or external) constraints is fundamental to many fields (e.g., modern robotics), particularly challenging when constraint information is only locally available and uncertain. Existing approaches requiring global constraints or using probabilistic filtering fail to fully exploit the geometric structure inherent in local measurements (by using, e.g., sensors) and constraints. This paper presents a geometric framework unifying measurements, constraints, and dynamics learning through a fiber bundle structure over the state space. This naturally induced geometric structure enables measurement-aware Control Barrier Functions that adapt to local sensing (or measurement) conditions. By integrating Neural ODEs, our framework learns continuous-time dynamics while preserving geometric constraints, with theoretical guarantees of learning convergence and constraint satisfaction dependent on sensing quality. The geometric framework not only enables efficient dynamics learning but also suggests promising directions for integration with reinforcement learning approaches. Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions.
comment: Accepted by ICML 2025
Synchronous Models and Fundamental Systems in Observer Design
This paper introduces the concept of a synchronous model as an extension of the internal model concept used in observer design for dynamical systems. A system is said to contain a synchronous model of another if there is a suitable error function between the two systems that remains stationary for all of the trajectories of the two systems. A system is said to admit a synchronous lift if a second system containing a synchronous model exists. We provide necessary and sufficient conditions that a system admits a synchronous lift and provide a method to construct a (there may be many) lifted system should one exist. We characterise the class of all systems that admit a synchronous lift by showing that they consist of fundamental vector fields induced by a Lie group action, a class of system we term fundamental systems. For fundamental systems we propose a simple synchronous observer design methodology, for which we show how correction terms can be discretised and combined easily, facilitating global characterisation of convergence and performance. Finally, we provide three examples to demonstrate the key concepts of synchrony, symmetry construction, and observer design for a fundamental system.
comment: 19 pages, 3 figures, submitted to IEEE TAC
LLA-MPC: Fast Adaptive Control for Autonomous Racing
We present Look-Back and Look-Ahead Adaptive Model Predictive Control (LLA-MPC), a real-time adaptive control framework for autonomous racing that addresses the challenge of rapidly changing tire-surface interactions. Unlike existing approaches requiring substantial data collection or offline training, LLA-MPC employs a model bank for immediate adaptation without a learning period. It integrates two key mechanisms: a look-back window that evaluates recent vehicle behavior to select the most accurate model and a look-ahead horizon that optimizes trajectory planning based on the identified dynamics. The selected model and estimated friction coefficient are then incorporated into a trajectory planner to optimize reference paths in real-time. Experiments across diverse racing scenarios demonstrate that LLA-MPC outperforms state-of-the-art methods in adaptation speed and handling, even during sudden friction transitions. Its learning-free, computationally efficient design enables rapid adaptation, making it ideal for high-speed autonomous racing in multi-surface environments.
VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Direct Pseudospectral Optimal Control by Orthogonal Polynomial Integral Collocation
This paper details a methodology to transcribe an optimal control problem into a nonlinear program for generation of the trajectories that optimize a given functional by approximating only the highest order derivatives of a given system's dynamics. The underlying method uses orthogonal polynomial integral collocation by which successive integrals are taken to approximate all lower order states. Hence, one set of polynomial coefficients can represent an entire coordinate's degree of freedom. Specifically, Chebyshev polynomials of the first and second kind and Legendre polynomials are used over their associated common interpolating grids derived from the bases' roots and extrema. Simple example problems compare different polynomial bases' performance to analytical solutions. The planar circular orbit raising problem is used to verify the method with solutions obtained by other pseudospectral methods in literature. Finally, a rocket landing flip maneuver problem is solved to demonstrate the ability to solve complex problems with multiple states and control variables with constraints. Simulations establish this method's performance, and reveal that the polynomial/node choice for a given problem notably affects the performance.
comment: Preprint submitted to the Journal of Guidance, Control, and Dynamics for publication
Split-as-a-Pro: behavioral control via operator splitting and alternating projections
The paper introduces Split-as-a-Pro, a control framework that integrates behavioral systems theory, operator splitting methods, and alternating projection algorithms. The framework reduces dynamic optimization problems - arising in both control and estimation - to efficient projection computations. Split-as-a-Pro builds on a non-parametric formulation that exploits system structure to separate dynamic constraints imposed by individual subsystems from external ones, such as interconnection constraints and input/output constraints. This enables the use of arbitrary system representations, as long as the associated projection is efficiently computable, thereby enhancing scalability and compatibility with gray-box modeling. We demonstrate the effectiveness of Split-as-a-Pro by developing a distributed algorithm for solving finite-horizon linear quadratic control problems and illustrate its use in predictive control. Our numerical case studies show that algorithms obtained using Split-as-a-Pro significantly outperform their centralized counterparts in runtime and scalability across various standard graph topologies, while seamlessly leveraging both model-based and data-driven system representations.
Alignment of large language models with constrained learning
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem, where the goal is to maximize a primary reward objective while satisfying constraints on secondary utilities. Despite the popularity of Lagrangian-based LLM policy search in constrained alignment, iterative primal-dual methods often fail to converge, and non-iterative dual-based methods do not achieve optimality in the LLM parameter space. To address these challenges, we employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the LLM policy via Lagrangian maximization and updating the dual variable via dual descent. In theory, we characterize the primal-dual gap between the primal value in the distribution space and the dual value in the LLM parameter space. We further quantify the optimality gap of the learned LLM policies at near-optimal dual variables with respect to both the objective and the constraint functions. These results prove that dual-based alignment methods can find an optimal constrained LLM policy, up to an LLM parametrization gap. We demonstrate the effectiveness and merits of our approach through extensive experiments conducted on the PKU-SafeRLHF dataset.
comment: 48 pages, 7 figures, 7 tables
Privacy-Preserving Peer-to-Peer Energy Trading via Hybrid Secure Computations
The massive integration of uncertain distributed renewable energy resources into power systems raises power imbalance concerns. Peer-to-peer (P2P) energy trading provides a promising way to balance the prosumers' volatile energy power generation and demands locally. Particularly, to protect the privacy of prosumers, distributed P2P energy trading is broadly advocated. However, severe privacy leakage issues can emerge in the realistic fully distributed P2P energy trading paradigm. Meanwhile, in this paradigm, two-party and multi-party computations coexist, challenging the naive privacy-preserving techniques. To tackle privacy leakage issues arising from the fully distributed P2P energy trading, this paper proposes a privacy-preserving approach via hybrid secure computations. A secure multi-party computation mechanism consisting of offline and online phases is developed to ensure the security of shared data by leveraging the tailored secret sharing method. In addition, the Paillier encryption method based on the Chinese Remainder Theorem is proposed for both the secure two-party computation and the offline phase of the multi-party computation. The random encryption coefficient is designed to enhance the security of the two-party computation and simultaneously guarantee the convergence of the distributed optimization. The feasible range for the encryption coefficient is derived with a strict mathematical proof. Numerical simulations demonstrate the exactness, effectiveness, and scalability of the proposed privacy-preserving approach.
Synergising Hierarchical Data Centers and Power Networks: A Privacy-Preserving Approach
In the era of digitization, data centers have emerged as integral contributors sustaining our interlinked world, bearing responsibility for an increasing proportion of the world's energy consumption. To facilitate the their fast rollout while progressing towards net-zero energy systems, the synergy of hierarchical data centers (cloud-fog-edge) and power networks can play a pivotal role. However, existing centralized co-dispatch manners encroach on the privacy of different agents within the integrated systems, meanwhile suffering from the combinatorial explosion. In this research, we propose a near-optimal distributed privacy-preserving approach to solve the non-convex synergy (day-ahead co-dispatch) problem. The synergy problem is formulated as a mixed integer quadratically constrained quadratic programming considering both communication and energy conservation, where Lyapunov optimization is introduced to balance operating costs and uncertain communication delays. To mitigate impacts of the highly non-convex nature, the normalized multi-parametric disaggregation technique is leveraged to reformulate the problem into a mixed integer non-linear programming. To further overcome non-smoothness of the reformulated problem, the customized $\ell_1-$surrogate Lagrangian relaxation method with convergence guarantees is proposed to solve the problem in a distributed privacy-preserving manner. {The effectiveness, optimality, and scalability of the proposed methodologies for the synergy problem are validated via numerical simulations. Simulation results also indicate that computing tasks can be delayed and migrated within the hierarchical data centers, demonstrating the flexible resource allocation capabilities of the hierarchical data center architecture, further facilitating peak load balancing in the power network.
Byzantine-Resilient Distributed P2P Energy Trading via Spatial-Temporal Anomaly Detection
Distributed peer-to-peer (P2P) energy trading mandates an escalating coupling between the physical power network and communication network, necessitating high-frequency sharing of real-time data among prosumers. However, this data-sharing scheme renders the system vulnerable to various malicious behaviors, as Byzantine agents can initiate cyberattacks by injecting sophisticated false data. To better investigate the impacts of malicious Byzantine faults, this paper develops a fully distributed P2P energy trading model by accounting for the high-fidelity physical network constraints. To further detect Byzantine faults and mitigate their impacts on distributed P2P energy trading problem, we propose an online spatial-temporal anomaly detection approach by leveraging the tensor learning method, which is informed by the domain knowledge to enable awesome detection performance. Moreover, to enhance its computational efficiency, we further develop closed-form solutions for the proposed detection approach. Subsequently, we derive theoretical conditions for guaranteeing optimality and convergence of the distributed P2P energy trading problem with anomaly detection mechanisms. Results from numerical simulations validate the effectiveness, optimality, and scalability of the proposed approach.
Robot Operation of Home Appliances by Reading User Manuals
Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) ground the policies to the appliance in the physical world, and (iii) execute the policies reliably over potentially many steps, despite compounding errors. To tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It grounds the symbolic actions visually to control panel elements. Finally, ApBot closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success rate, compared with state-of-the-art large VLMs used directly as control policies. These results suggest that a structured internal representations plays an important role in robust robot operation of home appliances, especially, complex ones.
Algorithmic Control Improves Residential Building Energy and EV Management when PV Capacity is High but Battery Capacity is Low
Efficient energy management in prosumer households is key to alleviating grid stress in an energy transition marked by electric vehicles (EV), renewable energies and battery storage. However, it is unclear how households optimize prosumer EV charging. Here we study real-world data from 90 households on fixed-rate electricity tariffs in German-speaking countries to investigate the potential of Deep Reinforcement Learning (DRL) and other control approaches (Rule-Based, Model Predictive Control) to manage the dynamic and uncertain environment of Home Energy Management (HEM) and optimize household charging patterns. The DRL agent efficiently aligns charging of EV and battery storage with photovoltaic (PV) surplus. We find that frequent EV charging transactions, early EV connections and PV surplus increase optimization potential. A detailed analysis of nine households (1 hour resolution, 1 year) demonstrates that high battery capacity facilitates self optimization; in this case further algorithmic control shows little value. In cases with relatively low battery capacity, algorithmic control with DRL improves energy management and cost savings by a relevant margin. This result is further corroborated by our simulation of a synthetic household. We conclude that prosumer households with optimization potential would profit from DRL, thus benefiting also the full electricity system and its decarbonization.
Numerical estimation of the lock-in domain of a DC/AC inverter
We estimate the lock-in domain of the origin of a current control system which is used in common DC/AC inverter designs. The system is a cascade connection of a 4-dimensional linear system (current controller, CC) followed by a two-dimensional nonlinear system (phase-locked loop, PLL). For the PLL, we construct a Lyapunov function via numerical approximation of its level curves. In combination with the quadratic Lyapunov function of the CC, it forms a vector Lyapunov function (VLF) for the overall system. A forward-invariant set of the VLF is found via numerical application of the comparison principle. By LaSalle's invariance principle, convergence to the origin is established.
comment: Submitted to the IFAC Symposium on Nonlinear Control Systems (NOLCOS) 2025
Stability of two-dimensional SISO LTI system with bounded feedback gain that has bounded derivative
We consider a two-dimensional SISO LTI system closed by uncertain linear feedback. The feedback gain is time-varying, bounded, and has a bounded derivative (both bounds are known). We investigate the asymptotic stability of this system under all admissible behaviors of the gain. Note that the situation is similar to the classical absolute stability problem of Lurie--Aizerman with two differences: linearity and derivative constraint. Our method of analysis is therefore inspired by the variational ideas of Pyatnitskii, Barabanov, Margaliot, and others developed for the absolute stability problem. We derive the Hamilton--Jacobi--Bellman equation for a function describing the "most unstable" of the possible portraits of the closed-loop system. A numerical method is proposed for solving the equation. Based on the solution, sufficient conditions are formulated for the asymptotic stability and instability. The method is applied to an equation arising from the analysis of a power electronics synchronization circuit.
comment: Submitted to the European Control Conference (ECC) 2025
Learning mechanical systems from real-world data using discrete forced Lagrangian dynamics
We introduce a data-driven method for learning the equations of motion of mechanical systems directly from position measurements, without requiring access to velocity data. This is particularly relevant in system identification tasks where only positional information is available, such as motion capture, pixel data or low-resolution tracking. Our approach takes advantage of the discrete Lagrange-d'Alembert principle and the forced discrete Euler-Lagrange equations to construct a physically grounded model of the system's dynamics. We decompose the dynamics into conservative and non-conservative components, which are learned separately using feed-forward neural networks. In the absence of external forces, our method reduces to a variational discretization of the action principle naturally preserving the symplectic structure of the underlying Hamiltonian system. We validate our approach on a variety of synthetic and real-world datasets, demonstrating its effectiveness compared to baseline methods. In particular, we apply our model to (1) measured human motion data and (2) latent embeddings obtained via an autoencoder trained on image sequences. We demonstrate that we can faithfully reconstruct and separate both the conservative and forced dynamics, yielding interpretable and physically consistent predictions.
(Un)supervised Learning of Maximal Lyapunov Functions
In this paper, we address the problem of discovering maximal Lyapunov functions, as a means of determining the region of attraction of a dynamical system. To this end, we design a novel neural network architecture, which we prove to be a universal approximator of (maximal) Lyapunov functions. The architecture combines a local quadratic approximation with the output of a neural network, which models global higher-order terms in the Taylor expansion. We formulate the problem of training the Lyapunov function as an unsupervised optimization problem with dynamical constraints, which can be solved leveraging techniques from physics-informed learning. We propose and analyze a tailored training algorithm, based on the primal-dual algorithm, that can efficiently solve the problem. Additionally, we show how the learning problem formulation can be adapted to integrate data, when available. We apply the proposed approach to different classes of systems, showing that it matches or outperforms state-of-the-art alternatives in the accuracy of the approximated regions of attraction.
Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity
We study the problem of estimating the optimal Q-function of $\gamma$-discounted Markov decision processes (MDPs) under the synchronous setting, where independent samples for all state-action pairs are drawn from a generative model at each iteration. We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ). VRCQ comprises two key building blocks: (i) the established direct variance reduction technique and (ii) our proposed variance reduction scheme, Cascade Q-learning. By leveraging these techniques, VRCQ provides superior guarantees in the $\ell_\infty$-norm compared with the existing model-free stochastic approximation-type algorithms. Specifically, we demonstrate that VRCQ is minimax optimal. Additionally, when the action set is a singleton (so that the Q-learning problem reduces to policy evaluation), it achieves non-asymptotic instance optimality while requiring the minimum number of samples theoretically possible. Our theoretical results and their practical implications are supported by numerical experiments.
comment: Update from v1: Proposition 1 has been revised. References have been updated
IAE Optimized PID Tuning via Second Order Step Response Target Matching
This paper presents SOSTIAE (Second-Order System Target IAE), a novel PID tuning method that combines IAE minimization with explicit transient response shaping for practical control applications. The algorithm generates optimal PID parameters by matching the closed-loop response to a target second-order system with user-defined settling time (Ts) and percent overshoot (PO), while maintaining the conventional IAE performance metric. Comparative evaluations on first to third-order systems demonstrate that SOSTIAE consistently outperforms MATLAB's proprietary pidtune function, achieving 47-67% lower overshoot and up to 26% better IAE performance for higher-order plants. The constrained optimization framework ensures physically realizable controllers by enforcing non-negative PID gains and stability criteria, addressing known limitations of unconstrained IAE methods. Results indicate that SOSTIAE provides engineers with a systematic alternative for PID tuning when transient specifications and practical implementation constraints are critical.
comment: 10 pages, 3 figures, 1 table
Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction
This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based neural network that generates high-resolution, multi-step predictions in a single operation. The prediction outcomes are further utilized to create geometric shapes formulated as mathematical constraints. Instead of treating each dynamic obstacle individually, predicted obstacles are grouped by proximity in an unsupervised way to improve performance and efficiency. The overall collision-free navigation is handled by model predictive control with a specific design for proactive dynamic obstacle avoidance. The proposed approach allows mobile robots to navigate effectively in dynamic environments. Its performance is accessed across various scenarios that represent typical warehouse settings. The results demonstrate that the proposed approach outperforms other existing dynamic obstacle avoidance methods.
comment: Accepted by IEEE RA-L
A Control-Oriented Simplified Single Particle Model with Grouped Parameter and Sensitivity Analysis for Lithium-Ion Batteries
Lithium-ion batteries are widely used in transportation, energy storage, and consumer electronics, driving the need for reliable battery management systems (BMS) for state estimation and control. The Single Particle Model (SPM) balances computational efficiency and accuracy but faces challenges in parameter estimation due to numerous parameters. Current SPM models using parabolic approximation introduce intermediate variables and hard to do parameter grouping. This study presents a control-oriented SPM reformulation that employs parameter grouping and parabolic approximation to simplify model parameters while using average and surface lithium-ion concentrations as model output. By parameter grouping, the original 17 parameters were reduced to 9 grouped parameters. The reformulated model achieves a reduced-order ordinary differential equation form while maintaining mathematical accuracy equivalent to the pre-grouped discretized SPM. Through Sobol sensitivity analysis under various current profiles, the grouped parameters were reduced from 9 to 6 highly sensitive parameters. Results demonstrate that estimating these 6 parameters achieves comparable practical accuracy to estimating all 9 parameters, with faster convergence. This control-oriented SPM enhances BMS applications by facilitating state estimation and control while reducing parameter estimation requirements.
comment: 30 pages, 4 figures
Convergence Analysis of Gradient Flow for Overparameterized LQR Formulations
Motivated by the growing use of artificial intelligence (AI) tools in control design, this paper analyses the intersection between results from gradient methods for the model-free linear quadratic regulator (LQR), and linear feedforward neural networks (LFFNNs), More specifically, it looks into the case where one wants to find a LFFNN feedback that minimizes a LQR cost. This paper starts by analyzing the structure of the gradient expression for the parameters of each layer, which implies a key conservation law of the system. This conservation law is then leveraged to generalize existing results on boundedness and global convergence of solutions to critical points, and invariance of the set of stabilizing networks under the training dynamics. This is followed by an analysis of the case where the LFFNN has a single hidden layer, for which the paper proves that the training converges not only to the critical points, but to the optimal feedback control law for all but a set of Lebesgue measure zero of the initializations. These theoretical results are followed by an extensive analysis of a simple version of the problem -- the ``vector case'' -- proving the theoretical properties of accelerated convergence and small-input input-to-state stability (ISS) for this simpler example. Finally, the paper presents numerical evidence of faster convergence of the training of general LFFNNs when compared to non-overparameterized formulations, showing that the acceleration of the solution is observable even when the gradient is not explicitly computed, but estimated from evaluations of the cost function.
Linearization of ReLU Activation Function for Neural Network-Embedded Optimization: Optimal Day-Ahead Energy Scheduling
Recently, neural networks have been widely applied in the power system area. They can be used for better predicting input information and modeling system performance with increased accuracy. In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models that enforce limits on the output of the same learning model. This will create a neural network-embedded optimization problem; the use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable. To address this emerging challenge, this paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function. Four linearization methods tailored for the ReLU activation function are developed, analyzed and compared in this paper. Each method employs a set of linear constraints to replace the ReLU function, effectively linearizing the optimization problem, which can overcome the computational challenges associated with the nonlinearity of the neural network model. These proposed linearization methods provide valuable tools for effectively solving optimization problems that integrate neural network models with ReLU activation functions
Computationally Efficient System Level Tube-MPC for Uncertain Systems
Tube-based model predictive control (MPC) is one of the principal robust control techniques for constrained linear systems affected by additive disturbances. While tube-based methods with online-computed tubes have been successfully applied to systems with additive disturbances, their application to systems affected by additional model uncertainties is challenging. This paper proposes a tube-based MPC method - named filter-based system level tube-MPC (SLTMPC) - which overapproximates both types of uncertainties with an online optimized disturbance set, while simultaneously computing the tube controller online. For the first time, we provide rigorous closed-loop guarantees for receding horizon control of such a MPC method. These guarantees are obtained by virtue of a new terminal controller design and an online optimized terminal set. To reduce the computational complexity of the proposed method, we additionally introduce an asynchronous computation scheme that separates the optimization of the tube controller and the nominal trajectory. Finally, we provide a comprehensive numerical evaluation of the proposed methods to demonstrate their effectiveness.
comment: 20 pages, 5 figures, Automatica 2025
Implicit predictors in regularized data-driven predictive control
We introduce the notion of implicit predictors, which characterize the input-(state)-output prediction behavior underlying a predictive control scheme, even if it is not explicitly enforced as an equality constraint (as in traditional model or subspace predictive control). To demonstrate this concept, we derive and analyze implicit predictors for some basic data-driven predictive control (DPC) schemes, which offers a new perspective on this popular approach that may form the basis for modified DPC schemes and further theoretical insights.
comment: This paper is a preprint of a contribution to the IEEE Control Systems Letters. 6 pages, 2 figures
Convex NMPC reformulations for a special class of nonlinear multi-input systems with application to rank-one bilinear networks
We show that a special class of (nonconvex) NMPC problems admits an exact solution by reformulating them as a finite number of convex subproblems, extending previous results to the multi-input case. Our approach is applicable to a special class of input-affine discrete-time systems, which includes a class of bilinear rank-one systems that is considered useful in modeling certain controlled networks. We illustrate our results with two numerical examples, including the aforementioned rank-one bilinear network.
comment: 8 pages, 4 Figures, submitted to 22nd World Congress of the International Federation of Automatic Control 2023
Distributed Safe Learning and Planning for Multi-robot Systems
This paper considers the problem of online multi-robot motion planning with general nonlinear dynamics subject to unknown external disturbances. We propose dSLAP, a distributed safe learning and planning framework that allows the robots to safely navigate through the environments by coupling online learning and motion planning. Gaussian process regression is used to online learn the disturbances with uncertainty quantification. The planning algorithm ensures collision avoidance against the learning uncertainty and utilizes set-valued analysis to achieve fast adaptation in response to the newly learned models. A set-valued model predictive control problem is then formulated and solved to return a control policy that balances between actively exploring the unknown disturbances and reaching goal regions. Sufficient conditions are established to guarantee the safety of the robots in the absence of backup policy. Monte Carlo simulations are conducted for evaluation.
Robotics
Towards Humanoid Robot Autonomy: A Dynamic Architecture Integrating Continuous thought Machines (CTM) and Model Context Protocol (MCP)
To address the gaps between the static pre-set "thinking-planning-action" of humanoid robots in unfamiliar scenarios and the highly programmed "call tool-return result" due to the lack of autonomous coding capabilities, this work designs a dynamic architecture connecting continuous thought machines (CTM) and model context protocol (MCP). It proposes a theoretical parallel solution through tick-slab and uses rank compression to achieve parameter suppression to provide a solution for achieving autonomous actions due to autonomous coding. The researcher used a simulation-based experiment using OpenAI's o4-mini-high as a tool to build the experimental environment, and introduced the extended SayCan dataset to conduct nine epochs of experiments. The experimental results show that the CTM-MCP architecture is feasible and effective through the data results of seven metrics: task success rate (TSR), execution success rate (ESR), average episode length (AEL), ROSCOE, REVEAL, proficiency self-assessment (PSA), task effectiveness (TE). In practice, it provides a reference experience for exploring the autonomous dynamic coding of humanoid robots based on continuous thinking to achieve human-like autonomous actions.
comment: The relevant architecture code and some experimental records have been uploaded to the GitHub repository for sharing: https://github.com/brucewang123456789/GeniusTrail/tree/main/CTM-MCP
Deriving The Fundamental Equation of Earthmoving and Configuring Vortex Studio Earthmoving Simulation for Soil Property Estimation Experimentation
This document serves as supplementary material for two International Society for Terrain-Vehicle Systems conference publications regarding in situ soil property estimation by Wagner et al. in 2023 and 2025. It covers the derivation of the fundamental equation of earthmoving for a flat blade moving through sloped soil and provides some information regarding the advanced configuration of Vortex Studio's soil-tool interaction simulation.
Passive Vibration Control of a 3-D Printer Gantry
Improved additive manufacturing capabilities are vital for the future development and improvement of ubiquitous robotic systems. These machines can be integrated into existing robotic systems to allow manufacturing and repair of components, as well as fabrication of custom parts for the robots themselves. The fused filament fabrication (FFF) process is one of the most common and well-developed AM processes but suffers from the effects of vibration-induced position error, particularly as the printing speed is raised. This project adapted and expanded a dynamic model of an FFF gantry system to include a passive spring-mass-damper system controller attached to the extruder carriage and tuned using optimal parameters. A case study was conducted to demonstrate the effects and generate recommendations for implementation. This work is also valuable for other mechatronic systems which operate using an open-loop control system and which suffer from vibration, including numerous robotic systems, pick-and-place machines, positioners, and similar.
comment: 7 pages, 4 figures, 2 tables, 31 equations
From Single Images to Motion Policies via Video-Generation Environment Representations
Autonomous robots typically need to construct representations of their surroundings and adapt their motions to the geometry of their environment. Here, we tackle the problem of constructing a policy model for collision-free motion generation, consistent with the environment, from a single input RGB image. Extracting 3D structures from a single image often involves monocular depth estimation. Developments in depth estimation have given rise to large pre-trained models such as DepthAnything. However, using outputs of these models for downstream motion generation is challenging due to frustum-shaped errors that arise. Instead, we propose a framework known as Video-Generation Environment Representation (VGER), which leverages the advances of large-scale video generation models to generate a moving camera video conditioned on the input image. Frames of this video, which form a multiview dataset, are then input into a pre-trained 3D foundation model to produce a dense point cloud. We then introduce a multi-scale noise approach to train an implicit representation of the environment structure and build a motion generation model that complies with the geometry of the representation. We extensively evaluate VGER over a diverse set of indoor and outdoor environments. We demonstrate its ability to produce smooth motions that account for the captured geometry of a scene, all from a single RGB input image.
Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Modern policy gradient algorithms, such as TRPO and PPO, outperform vanilla policy gradient in many RL tasks. Questioning the common belief that enforcing approximate trust regions leads to steady policy improvement in practice, we show that the more critical factor is the enhanced value estimation accuracy from more value update steps in each iteration. To demonstrate, we show that by simply increasing the number of value update steps per iteration, vanilla policy gradient itself can achieve performance comparable to or better than PPO in all the standard continuous control benchmark environments. Importantly, this simple change to vanilla policy gradient is significantly more robust to hyperparameter choices, opening up the possibility that RL algorithms may still become more effective and easier to use.
comment: 15 pages and 21 figures
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $\epsilon$ sub-optimality and feasible policy after $O(\epsilon^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($\gamma$) and by at least 6x for larger value of $\gamma$.
Sensorimotor features of self-awareness in multimodal large language models
Self-awareness - the ability to distinguish oneself from the surrounding environment - underpins intelligent, autonomous behavior. Recent advances in AI achieve human-like performance in tasks integrating multimodal information, particularly in large language models, raising interest in the embodiment capabilities of AI agents on nonhuman platforms such as robots. Here, we explore whether multimodal LLMs can develop self-awareness solely through sensorimotor experiences. By integrating a multimodal LLM into an autonomous mobile robot, we test its ability to achieve this capacity. We find that the system exhibits robust environmental awareness, self-recognition and predictive awareness, allowing it to infer its robotic nature and motion characteristics. Structural equation modeling reveals how sensory integration influences distinct dimensions of self-awareness and its coordination with past-present memory, as well as the hierarchical internal associations that drive self-identification. Ablation tests of sensory inputs identify critical modalities for each dimension, demonstrate compensatory interactions among sensors and confirm the essential role of structured and episodic memory in coherent reasoning. These findings demonstrate that, given appropriate sensory information about the world and itself, multimodal LLMs exhibit emergent self-awareness, opening the door to artificial embodied cognitive systems.
comment: 16 pages, 3 figures, 1 table
Learning the Contact Manifold for Accurate Pose Estimation During Peg-in-Hole Insertion of Complex Geometries ICRA 2025
Contact-rich assembly of complex, non-convex parts with tight tolerances remains a formidable challenge. Purely model-based methods struggle with discontinuous contact dynamics, while model-free methods require vast data and often lack precision. In this work, we introduce a hybrid framework that uses only contact-state information between a complex peg and its mating hole to recover the full SE(3) pose during assembly. In under 10 seconds of online execution, a sequence of primitive probing motions constructs a local contact submanifold, which is then aligned to a precomputed offline contact manifold to yield sub-mm and sub-degree pose estimates. To eliminate costly k-NN searches, we train a lightweight network that projects sparse contact observations onto the contact manifold and is 95x faster and 18% more accurate. Our method, evaluated on three industrially relevant geometries with clearances of 0.1-1.0 mm, achieves a success rate of 93.3%, a 4.1x improvement compared to primitive-only strategies without state estimation.
comment: Accepted and presented in IEEE ICRA 2025 Contact-Rich Manipulation Workshop (https://contact-rich.github.io/). Awarded Best Paper Runner-Up. 5 pages with 1 figure
Omni-Perception: Omnidirectional Collision Avoidance for Legged Locomotion in Dynamic Environments
Agile locomotion in complex 3D environments requires robust spatial awareness to safely avoid diverse obstacles such as aerial clutter, uneven terrain, and dynamic agents. Depth-based perception approaches often struggle with sensor noise, lighting variability, computational overhead from intermediate representations (e.g., elevation maps), and difficulties with non-planar obstacles, limiting performance in unstructured environments. In contrast, direct integration of LiDAR sensing into end-to-end learning for legged locomotion remains underexplored. We propose Omni-Perception, an end-to-end locomotion policy that achieves 3D spatial awareness and omnidirectional collision avoidance by directly processing raw LiDAR point clouds. At its core is PD-RiskNet (Proximal-Distal Risk-Aware Hierarchical Network), a novel perception module that interprets spatio-temporal LiDAR data for environmental risk assessment. To facilitate efficient policy learning, we develop a high-fidelity LiDAR simulation toolkit with realistic noise modeling and fast raycasting, compatible with platforms such as Isaac Gym, Genesis, and MuJoCo, enabling scalable training and effective sim-to-real transfer. Learning reactive control policies directly from raw LiDAR data enables the robot to navigate complex environments with static and dynamic obstacles more robustly than approaches relying on intermediate maps or limited sensing. We validate Omni-Perception through real-world experiments and extensive simulation, demonstrating strong omnidirectional avoidance capabilities and superior locomotion performance in highly dynamic environments. We will open-source our code and models.
SPADE: Towards Scalable Path Planning Architecture on Actionable Multi-Domain 3D Scene Graphs IROS 2025
In this work, we introduce SPADE, a path planning framework designed for autonomous navigation in dynamic environments using 3D scene graphs. SPADE combines hierarchical path planning with local geometric awareness to enable collision-free movement in dynamic scenes. The framework bifurcates the planning problem into two: (a) solving the sparse abstract global layer plan and (b) iterative path refinement across denser lower local layers in step with local geometric scene navigation. To ensure efficient extraction of a feasible route in a dense multi-task domain scene graphs, the framework enforces informed sampling of traversable edges prior to path-planning. This removes extraneous information not relevant to path-planning and reduces the overall planning complexity over a graph. Existing approaches address the problem of path planning over scene graphs by decoupling hierarchical and geometric path evaluation processes. Specifically, this results in an inefficient replanning over the entire scene graph when encountering path obstructions blocking the original route. In contrast, SPADE prioritizes local layer planning coupled with local geometric scene navigation, enabling navigation through dynamic scenes while maintaining efficiency in computing a traversable route. We validate SPADE through extensive simulation experiments and real-world deployment on a quadrupedal robot, demonstrating its efficacy in handling complex and dynamic scenarios.
comment: Submitted to IROS 2025
MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation
Humans interact with their world while leveraging precise full-body control to achieve versatile goals. This versatility allows them to solve long-horizon, underspecified problems, such as placing a cup in a sink, by seamlessly sequencing actions like approaching the cup, grasping, transporting it, and finally placing it in the sink. Such goal-driven control can enable new procedural tools for animation systems, enabling users to define partial objectives while the system naturally ``fills in'' the intermediate motions. However, while current methods for whole-body dexterous manipulation in physics-based animation achieve success in specific interaction tasks, they typically employ control paradigms (e.g., detailed kinematic motion tracking, continuous object trajectory following, or direct VR teleoperation) that offer limited versatility for high-level goal specification across the entire coupled human-object system. To bridge this gap, we present MaskedManipulator, a unified and generative policy developed through a two-stage learning approach. First, our system trains a tracking controller to physically reconstruct complex human-object interactions from large-scale human mocap datasets. This tracking controller is then distilled into MaskedManipulator, which provides users with intuitive control over both the character's body and the manipulated object. As a result, MaskedManipulator enables users to specify complex loco-manipulation tasks through intuitive high-level objectives (e.g., target object poses, key character stances), and MaskedManipulator then synthesizes the necessary full-body actions for a physically simulated humanoid to achieve these goals, paving the way for more interactive and life-like virtual characters.
ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning
Vision-Language-Action (VLA) models have gained much attention from the research community thanks to their strength in translating multimodal observations with linguistic instructions into robotic actions. Despite their recent advancements, VLAs often overlook the explicit reasoning and only learn the functional input-action mappings, omitting these crucial logical steps for interpretability and generalization for complex, long-horizon manipulation tasks. In this work, we propose \textit{ReFineVLA}, a multimodal reasoning-aware framework that fine-tunes VLAs with teacher-guided reasons. We first augment robotic datasets with reasoning rationales generated by an expert teacher model, guiding VLA models to learn to reason about their actions. Then, we use \textit{ReFineVLA} to fine-tune pre-trained VLAs with the reasoning-enriched datasets, while maintaining their inherent generalization abilities and boosting reasoning capabilities. In addition, we conduct an attention map visualization to analyze the alignment among visual attention, linguistic prompts, and to-be-executed actions of \textit{ReFineVLA}, showcasing its ability to focus on relevant tasks and actions. Through the latter step, we explore that \textit{ReFineVLA}-trained models exhibit a meaningful attention shift towards relevant objects, highlighting the enhanced multimodal understanding and improved generalization. Evaluated across manipulation tasks, \textit{ReFineVLA} outperforms the state-of-the-art baselines. Specifically, it achieves an average increase of $5.0\%$ success rate on SimplerEnv WidowX Robot tasks, improves by an average of $8.6\%$ in variant aggregation settings, and by $1.7\%$ in visual matching settings for SimplerEnv Google Robot tasks. The source code will be publicly available.
comment: 10 pages
Staircase Recognition and Location Based on Polarization Vision
Staircase is one of the most common structures in artificial scenes. However, it is difficult for humanoid robots and people with lower limb disabilities or visual impairment to cross the scene without the help of sensors and intelligent algorithms. Staircase scene perception technology is a prerequisite for recognition and localization. This technology is of great significance for the mode switching of the robot and the calculation of the footprint position to adapt to the discontinuous terrain. However, there are still many problems that constrain the application of this technology, such as low recognition accuracy, high initial noise from sensors, unstable output signals and high computational requirements. In terms of scene reconstruction, the binocular and time of flight (TOF) reconstruction of the scene can be easily affected by environmental light and the surface material of the target object. In contrast, due to the special structure of the polarizer, the polarization can selectively transmit polarized light in a specific direction and this reconstruction method relies on the polarization information of the object surface. So the advantages of polarization reconstruction are reflected, which are less affected by environmental light and not dependent on the texture information of the object surface. In this paper, in order to achieve the detection of staircase, this paper proposes a contrast enhancement algorithm that integrates polarization and light intensity information, and integrates point cloud segmentation based on YOLOv11. To realize the high-quality reconstruction, we proposed a method of fusing polarized binocular and TOF depth information to realize the three-dimensional (3D) reconstruction of the staircase. Besides, it also proposes a joint calibration algorithm of monocular camera and TOF camera based on ICP registration and improved gray wolf optimization algorithm.
WorldEval: World Model as Real-World Robot Policies Evaluator
The field of robotics has made significant strides toward developing generalist robot manipulation policies. However, evaluating these policies in real-world scenarios remains time-consuming and challenging, particularly as the number of tasks scales and environmental conditions change. In this work, we demonstrate that world models can serve as a scalable, reproducible, and reliable proxy for real-world robot policy evaluation. A key challenge is generating accurate policy videos from world models that faithfully reflect the robot actions. We observe that directly inputting robot actions or using high-dimensional encoding methods often fails to generate action-following videos. To address this, we propose Policy2Vec, a simple yet effective approach to turn a video generation model into a world simulator that follows latent action to generate the robot video. We then introduce WorldEval, an automated pipeline designed to evaluate real-world robot policies entirely online. WorldEval effectively ranks various robot policies and individual checkpoints within a single policy, and functions as a safety detector to prevent dangerous actions by newly developed robot models. Through comprehensive paired evaluations of manipulation policies in real-world environments, we demonstrate a strong correlation between policy performance in WorldEval and real-world scenarios. Furthermore, our method significantly outperforms popular methods such as real-to-sim approach.
comment: The project page is available at https://worldeval.github.io
Designing Pin-pression Gripper and Learning its Dexterous Grasping with Online In-hand Adjustment
We introduce a novel design of parallel-jaw grippers drawing inspiration from pin-pression toys. The proposed pin-pression gripper features a distinctive mechanism in which each finger integrates a 2D array of pins capable of independent extension and retraction. This unique design allows the gripper to instantaneously customize its finger's shape to conform to the object being grasped by dynamically adjusting the extension/retraction of the pins. In addition, the gripper excels in in-hand re-orientation of objects for enhanced grasping stability again via dynamically adjusting the pins. To learn the dynamic grasping skills of pin-pression grippers, we devise a dedicated reinforcement learning algorithm with careful designs of state representation and reward shaping. To achieve a more efficient grasp-while-lift grasping mode, we propose a curriculum learning scheme. Extensive evaluations demonstrate that our design, together with the learned skills, leads to highly flexible and robust grasping with much stronger generality to unseen objects than alternatives. We also highlight encouraging physical results of sim-to-real transfer on a physically manufactured pin-pression gripper, demonstrating the practical significance of our novel gripper design and grasping skill. Demonstration videos for this paper are available at https://github.com/siggraph-pin-pression-gripper/pin-pression-gripper-video.
Echo Planning for Autonomous Driving: From Current Observations to Future Trajectories and Back
Modern end-to-end autonomous driving systems suffer from a critical limitation: their planners lack mechanisms to enforce temporal consistency between predicted trajectories and evolving scene dynamics. This absence of self-supervision allows early prediction errors to compound catastrophically over time. We introduce Echo Planning, a novel self-correcting framework that establishes a closed-loop Current - Future - Current (CFC) cycle to harmonize trajectory prediction with scene coherence. Our key insight is that plausible future trajectories must be bi-directionally consistent, ie, not only generated from current observations but also capable of reconstructing them. The CFC mechanism first predicts future trajectories from the Bird's-Eye-View (BEV) scene representation, then inversely maps these trajectories back to estimate the current BEV state. By enforcing consistency between the original and reconstructed BEV representations through a cycle loss, the framework intrinsically penalizes physically implausible or misaligned trajectories. Experiments on nuScenes demonstrate state-of-the-art performance, reducing L2 error by 0.04 m and collision rate by 0.12% compared to one-shot planners. Crucially, our method requires no additional supervision, leveraging the CFC cycle as an inductive bias for robust planning. This work offers a deployable solution for safety-critical autonomous systems.
comment: 13 pages, 4 figures
Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios
In collaborative human-robot environments, the unpredictable and dynamic nature of human motion can lead to situations where collisions become unavoidable. In such cases, it is essential for the robotic system to proactively mitigate potential harm through intelligent control strategies. This paper presents a hierarchical control framework based on Control Barrier Functions (CBFs) designed to ensure safe and adaptive operation of autonomous robotic manipulators during close-proximity human-robot interaction. The proposed method introduces a relaxation variable that enables real-time prioritization of safety constraints, allowing the robot to dynamically manage collision risks based on the criticality of different parts of the human body. A secondary constraint mechanism is incorporated to resolve infeasibility by increasing the priority of imminent threats. The framework is experimentally validated on a Franka Research 3 robot equipped with a ZED2i AI camera for real-time human pose and body detection. Experimental results confirm that the CBF-based controller, integrated with depth sensing, facilitates responsive and safe human-robot collaboration, while providing detailed risk analysis and maintaining robust performance in highly dynamic settings.
WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving
Language models uncover unprecedented abilities in analyzing driving scenarios, owing to their limitless knowledge accumulated from text-based pre-training. Naturally, they should particularly excel in analyzing rule-based interactions, such as those triggered by traffic laws, which are well documented in texts. However, such interaction analysis remains underexplored due to the lack of dedicated language datasets that address it. Therefore, we propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a comprehensive large-scale Q&As dataset built on WOMD focusing on describing and reasoning traffic rule-induced interactions in driving scenarios. WOMD-Reasoning also presents by far the largest multi-modal Q&A dataset, with 3 million Q&As on real-world driving scenarios, covering a wide range of driving topics from map descriptions and motion status descriptions to narratives and analyses of agents' interactions, behaviors, and intentions. To showcase the applications of WOMD-Reasoning, we design Motion-LLaVA, a motion-language model fine-tuned on WOMD-Reasoning. Quantitative and qualitative evaluations are performed on WOMD-Reasoning dataset as well as the outputs of Motion-LLaVA, supporting the data quality and wide applications of WOMD-Reasoning, in interaction predictions, traffic rule compliance plannings, etc. The dataset and its vision modal extension are available on https://waymo.com/open/download/. The codes & prompts to build it are available on https://github.com/yhli123/WOMD-Reasoning.
Designing a skilled soccer team for RoboCup: exploring skill-set-primitives through reinforcement learning
The RoboCup 3D Soccer Simulation League serves as a competitive platform for showcasing innovation in autonomous humanoid robot agents through simulated soccer matches. Our team, FC Portugal, developed a new codebase from scratch in Python after RoboCup 2021. The team's performance relies on a set of skills centered around novel unifying primitives and a custom, symmetry-extended version of the Proximal Policy Optimization algorithm. Our methods have been thoroughly tested in official RoboCup matches, where FC Portugal has won the last two main competitions, in 2022 and 2023. This paper presents our training framework, as well as a timeline of skills developed using our skill-set-primitives, which considerably improve the sample efficiency and stability of skills, and motivate seamless transitions. We start with a significantly fast sprint-kick developed in 2021 and progress to the most recent skill set, including a multi-purpose omnidirectional walk, a dribble with unprecedented ball control, a solid kick, and a push skill. The push addresses low-level collision scenarios and high-level strategies to increase ball possession. We address the resource-intensive nature of this task through an innovative multi-agent learning approach. Finally, we release the team's codebase to the RoboCup community, providing other teams with a robust and modern foundation upon which they can build new features.
comment: Codebase release at https://github.com/m-abr/FCPCodebase
Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions
Neural Networks (NNs) have been successfully employed to represent the state evolution of complex dynamical systems. Such models, referred to as NN dynamic models (NNDMs), use iterative noisy predictions of NN to estimate a distribution of system trajectories over time. Despite their accuracy, safety analysis of NNDMs is known to be a challenging problem and remains largely unexplored. To address this issue, in this paper, we introduce a method of providing safety guarantees for NNDMs. Our approach is based on stochastic barrier functions, whose relation with safety are analogous to that of Lyapunov functions with stability. We first show a method of synthesizing stochastic barrier functions for NNDMs via a convex optimization problem, which in turn provides a lower bound on the system's safety probability. A key step in our method is the employment of the recent convex approximation results for NNs to find piece-wise linear bounds, which allow the formulation of the barrier function synthesis problem as a sum-of-squares optimization program. If the obtained safety probability is above the desired threshold, the system is certified. Otherwise, we introduce a method of generating controls for the system that robustly maximizes the safety probability in a minimally-invasive manner. We exploit the convexity property of the barrier function to formulate the optimal control synthesis problem as a linear program. Experimental results illustrate the efficacy of the method. Namely, they show that the method can scale to multi-dimensional NNDMs with multiple layers and hundreds of neurons per layer, and that the controller can significantly improve the safety probability.
Robust Immersive Bilateral Teleoperation of Beyond-Human-Scale Systems with Enhanced Transparency and Sense of Embodiment
In human-in-the-loop systems such as teleoperation, especially those involving heavy-duty manipulators, achieving high task performance requires both robust control and strong human engagement. This paper presents a bilateral teleoperation framework for beyond-human-scale robotic systems that enhances the transparency and the operator's sense of embodiment (SoE), specifically, the senses of agency and self-location, through an immersive virtual reality interface and distributed haptic feedback. To support this embodiment and establish high level of motion and force transparency, we develop a force-sensorless, robust control architecture that tackles input nonlinearities, master-surrogate asymmetries, unknown uncertainties, and arbitrary time delays. A human-robot augmented dynamic model is integrated into the control loop to enhance human-adaptability of the controller. Theoretical analysis confirms semi-global uniform ultimate boundedness of the closed-loop system, guaranteeing the robustness to the real-world uncertainties. Extensive real-world experiments demonstrate high accuracy tracking under up to 1:13 motion scaling and 1:1000 force scaling, showcasing the significance of the results. Additionally, the stability-transparency tradeoff for motion tracking and force reflection and tracking is established up to 150 ms of one-way fix and time-varying communication delays. The results of user study with 10 participants (9 male and 1 female) demonstrate that the system can imply a good level of SoE (76.4%), at the same time is very user friendly with no gender limitation. These results are significant given the scale and weight of the heavy-duty manipulators.
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction
Vision-Language-Action (VLA) models have recently advanced robotic manipulation by translating natural-language instructions and image information into sequential control actions. However, these models often underperform in open-world scenarios, as they are predominantly trained on successful expert demonstrations and exhibit a limited capacity for failure recovery. In this work, we present a Robotic Failure Analysis and Correction (RoboFAC) framework to address this issue. Firstly, we construct RoboFAC dataset comprising 9,440 erroneous manipulation trajectories and 78,623 QA pairs across 16 diverse tasks and 53 scenes in both simulation and real-world environments. Leveraging our dataset, we develop RoboFAC model, which is capable of Task Understanding, Failure Analysis and Failure Correction. Experimental results demonstrate that the RoboFAC model outperforms GPT-4o by 34.1% on our evaluation benchmark. Furthermore, we integrate the RoboFAC model into a real-world VLA control pipeline as an external supervision providing correction instructions, yielding a 29.1% relative improvement on average on four real-world tasks. The results show that our RoboFAC framework effectively handles robotic failures and assists the VLA model in recovering from failures.
On-Demand Scenario Generation for Testing Automated Driving Systems
The safety and reliability of Automated Driving Systems (ADS) are paramount, necessitating rigorous testing methodologies to uncover potential failures before deployment. Traditional testing approaches often prioritize either natural scenario sampling or safety-critical scenario generation, resulting in overly simplistic or unrealistic hazardous tests. In practice, the demand for natural scenarios (e.g., when evaluating the ADS's reliability in real-world conditions), critical scenarios (e.g., when evaluating safety in critical situations), or somewhere in between (e.g., when testing the ADS in regions with less civilized drivers) varies depending on the testing objectives. To address this issue, we propose the On-demand Scenario Generation (OSG) Framework, which generates diverse scenarios with varying risk levels. Achieving the goal of OSG is challenging due to the complexity of quantifying the criticalness and naturalness stemming from intricate vehicle-environment interactions, as well as the need to maintain scenario diversity across various risk levels. OSG learns from real-world traffic datasets and employs a Risk Intensity Regulator to quantitatively control the risk level. It also leverages an improved heuristic search method to ensure scenario diversity. We evaluate OSG on the Carla simulators using various ADSs. We verify OSG's ability to generate scenarios with different risk levels and demonstrate its necessity by comparing accident types across risk levels. With the help of OSG, we are now able to systematically and objectively compare the performance of different ADSs based on different risk levels.
comment: 20 pages, 9 figures. Accepted by FSE 2025
SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network
Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a self-correction module can partially alleviate such issues. Motivated by this concern, we introduce the Single-Step Assembly Error Correction Task, which involves identifying and rectifying misassembled components. To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures. Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel method to address this task. SCANet treats assembled components as queries, determining their correctness in manual images and providing corrections when necessary. Finally, we utilize SCANet to correct the assembly results of MEPNet. Experimental results demonstrate that SCANet can identify and correct MEPNet's misassembled results, significantly improving the correctness of assembly. Our code and dataset could be found at https://scanet-iros2024.github.io/.
Multiagent Systems
Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning
Poor interpretability hinders the practical applicability of multi-agent reinforcement learning (MARL) policies. Deploying interpretable surrogates of uninterpretable policies enhances the safety and verifiability of MARL for real-world applications. However, if these surrogates are to interact directly with the environment within human supervisory frameworks, they must be both performant and computationally efficient. Prior work on interpretable MARL has either sacrificed performance for computational efficiency or computational efficiency for performance. To address this issue, we propose HYDRAVIPER, a decision tree-based interpretable MARL algorithm. HYDRAVIPER coordinates training between agents based on expected team performance, and adaptively allocates budgets for environment interaction to improve computational efficiency. Experiments on standard benchmark environments for multi-agent coordination and traffic signal control show that HYDRAVIPER matches the performance of state-of-the-art methods using a fraction of the runtime, and that it maintains a Pareto frontier of performance for different interaction budgets.
comment: 16 pages; 2 tables; 11 figures
A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control
Traditional Identity and Access Management (IAM) systems, primarily designed for human users or static machine identities via protocols such as OAuth, OpenID Connect (OIDC), and SAML, prove fundamentally inadequate for the dynamic, interdependent, and often ephemeral nature of AI agents operating at scale within Multi Agent Systems (MAS), a computational system composed of multiple interacting intelligent agents that work collectively. This paper posits the imperative for a novel Agentic AI IAM framework: We deconstruct the limitations of existing protocols when applied to MAS, illustrating with concrete examples why their coarse-grained controls, single-entity focus, and lack of context-awareness falter. We then propose a comprehensive framework built upon rich, verifiable Agent Identities (IDs), leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), that encapsulate an agents capabilities, provenance, behavioral scope, and security posture. Our framework includes an Agent Naming Service (ANS) for secure and capability-aware discovery, dynamic fine-grained access control mechanisms, and critically, a unified global session management and policy enforcement layer for real-time control and consistent revocation across heterogeneous agent communication protocols. We also explore how Zero-Knowledge Proofs (ZKPs) enable privacy-preserving attribute disclosure and verifiable policy compliance. We outline the architecture, operational lifecycle, innovative contributions, and security considerations of this new IAM paradigm, aiming to establish the foundational trust, accountability, and security necessary for the burgeoning field of agentic AI and the complex ecosystems they will inhabit.
comment: 24 Pages, 5 figures, 2 tables
Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes
We develop information theory for the temporal behavior of memoryful agents moving through complex -- structured, stochastic -- environments. We introduce information processes -- stochastic processes produced by cognitive agents in real-time as they interact with and interpret incoming stimuli. We provide basic results on the ergodicity and semantics of the resulting time series of Shannon information measures that monitor an agent's adapting view of uncertainty and structural correlation in its environment.
comment: 24 pages, 10 figures, 1 appendix; http://csc.ucdavis.edu/~cmg/compmech/pubs/iprocesses.htm
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration face critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm, learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization.
Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding
Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.
comment: 112 pages, 21 figures, 20 tables
OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization
Hyperparameter optimization (HPO) is a critical yet challenging aspect of machine learning model development, significantly impacting model performance and generalization. Traditional HPO methods often struggle with high dimensionality, complex interdependencies, and computational expense. This paper introduces OptiMindTune, a novel multi-agent framework designed to intelligently and efficiently optimize hyperparameters. OptiMindTune leverages the collaborative intelligence of three specialized AI agents -- a Recommender Agent, an Evaluator Agent, and a Decision Agent -- each powered by Google's Gemini models. These agents address distinct facets of the HPO problem, from model selection and hyperparameter suggestion to robust evaluation and strategic decision-making. By fostering dynamic interactions and knowledge sharing, OptiMindTune aims to converge to optimal hyperparameter configurations more rapidly and robustly than existing single-agent or monolithic approaches. Our framework integrates principles from advanced large language models, and adaptive search to achieve scalable and intelligent AutoML. We posit that this multi-agent paradigm offers a promising avenue for tackling the increasing complexity of modern machine learning model tuning.
comment: 7 pages, 2 tables
Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management
Motivated by dynamic parameter optimization in finite, but large action (configurations) spaces, this work studies the nonstochastic multi-armed bandit (MAB) problem in metric action spaces with oblivious Lipschitz adversaries. We propose ABoB, a hierarchical Adversarial Bandit over Bandits algorithm that can use state-of-the-art existing "flat" algorithms, but additionally clusters similar configurations to exploit local structures and adapt to changing environments. We prove that in the worst-case scenario, such clustering approach cannot hurt too much and ABoB guarantees a standard worst-case regret bound of $O\left(k^{\frac{1}{2}}T^{\frac{1}{2}}\right)$, where $T$ is the number of rounds and $k$ is the number of arms, matching the traditional flat approach. However, under favorable conditions related to the algorithm properties, clusters properties, and certain Lipschitz conditions, the regret bound can be improved to $O\left(k^{\frac{1}{4}}T^{\frac{1}{2}}\right)$. Simulations and experiments on a real storage system demonstrate that ABoB, using standard algorithms like EXP3 and Tsallis-INF, achieves lower regret and faster convergence than the flat method, up to 50% improvement in known previous setups, nonstochastic and stochastic, as well as in our settings.
SANNet: A Semantic-Aware Agentic AI Networking Framework for Multi-Agent Cross-Layer Coordination
Agentic AI networking (AgentNet) is a novel AI-native networking paradigm that relies on a large number of specialized AI agents to collaborate and coordinate for autonomous decision-making, dynamic environmental adaptation, and complex goal achievement. It has the potential to facilitate real-time network management alongside capabilities for self-configuration, self-optimization, and self-adaptation across diverse and complex networking environments, laying the foundation for fully autonomous networking systems in the future. Despite its promise, AgentNet is still in the early stage of development, and there still lacks an effective networking framework to support automatic goal discovery and multi-agent self-orchestration and task assignment. This paper proposes SANNet, a novel semantic-aware agentic AI networking architecture that can infer the semantic goal of the user and automatically assign agents associated with different layers of a mobile system to fulfill the inferred goal. Motivated by the fact that one of the major challenges in AgentNet is that different agents may have different and even conflicting objectives when collaborating for certain goals, we introduce a dynamic weighting-based conflict-resolving mechanism to address this issue. We prove that SANNet can provide theoretical guarantee in both conflict-resolving and model generalization performance for multi-agent collaboration in dynamic environment. We develop a hardware prototype of SANNet based on the open RAN and 5GS core platform. Our experimental results show that SANNet can significantly improve the performance of multi-agent networking systems, even when agents with conflicting objectives are selected to collaborate for the same goal.
comment: submitted to IEEE GLOBECOM'25
Collaborative Agentic AI Needs Interoperability Across Ecosystems
Collaborative agentic AI is projected to transform entire industries by enabling AI-powered agents to autonomously perceive, plan, and act within digital environments. Yet, current solutions in this field are all built in isolation, and we are rapidly heading toward a landscape of fragmented, incompatible ecosystems. In this position paper, we argue that interoperability, achieved by the adoption of minimal standards, is essential to ensure open, secure, web-scale, and widely-adopted agentic ecosystems. To this end, we devise a minimal architectural foundation for collaborative agentic AI, named Web of Agents, which is composed of four components: agent-to-agent messaging, interaction interoperability, state management, and agent discovery. Web of Agents adopts existing standards and reuses existing infrastructure where possible. With Web of Agents, we take the first but critical step toward interoperable agentic systems and offer a pragmatic path forward before ecosystem fragmentation becomes the norm.
Adaptive Inference through Bayesian and Inverse Bayesian Inference with Symmetry-Bias in Nonstationary Environments
This study introduces a novel inference framework, designated as Bayesian and inverse Bayesian (BIB) inference, which concurrently performs both conventional and inverse Bayesian updates by integrating symmetry bias into Bayesian inference. The effectiveness of the model was evaluated through a sequential estimation task involving observations sampled from a Gaussian distribution with a stochastically time-varying mean. Conventional Bayesian inference entails a fundamental trade-off between adaptability to abrupt environmental shifts and estimation accuracy during stable intervals. The BIB framework addresses this limitation by dynamically modulating the learning rate through inverse Bayesian updates, thereby enhancing adaptive flexibility. The BIB model generated spontaneous bursts in the learning rate during sudden environmental transitions, transiently entering a high-sensitivity state to accommodate incoming data. This intermittent burst-relaxation pattern functions as a dynamic mechanism that balances adaptability and accuracy. Further analysis of burst interval distributions demonstrated that the BIB model consistently produced power-law distributions under diverse conditions. Such robust scaling behavior, absent in conventional Bayesian inference, appears to emerge from a self-regulatory mechanism driven by inverse Bayesian updates. These results present a novel computational perspective on scale-free phenomena in natural systems and offer implications for designing adaptive inference systems in nonstationary environments.
Interacting Large Language Model Agents. Interpretable Models and Social Learning
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under 2 settings: (a) centrally controlled LLMAs (b) autonomous LLMAs with incentives. We demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and closed-source models like ChatGPT. The main takeaway of this paper, based on empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
comment: 41 Pages
Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System ACL 2025
The rapid advancement of scientific progress requires innovative tools that can accelerate knowledge discovery. Although recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short of replicating the collaborative nature of real-world scientific practices, where diverse experts work together in teams to tackle complex problems. To address the limitations, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas. We further investigate the collaboration mechanisms that contribute to its tendency to produce ideas with higher novelty, offering valuable insights to guide future research and illuminating pathways toward building a robust system for autonomous scientific discovery. The code is available at https://github.com/open-sciencelab/Virtual-Scientists.
comment: Accepted by ACL 2025 Main Conference
Systems and Control (CS)
Passive Vibration Control of a 3-D Printer Gantry
Improved additive manufacturing capabilities are vital for the future development and improvement of ubiquitous robotic systems. These machines can be integrated into existing robotic systems to allow manufacturing and repair of components, as well as fabrication of custom parts for the robots themselves. The fused filament fabrication (FFF) process is one of the most common and well-developed AM processes but suffers from the effects of vibration-induced position error, particularly as the printing speed is raised. This project adapted and expanded a dynamic model of an FFF gantry system to include a passive spring-mass-damper system controller attached to the extruder carriage and tuned using optimal parameters. A case study was conducted to demonstrate the effects and generate recommendations for implementation. This work is also valuable for other mechatronic systems which operate using an open-loop control system and which suffer from vibration, including numerous robotic systems, pick-and-place machines, positioners, and similar.
comment: 7 pages, 4 figures, 2 tables, 31 equations
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $\epsilon$ sub-optimality and feasible policy after $O(\epsilon^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($\gamma$) and by at least 6x for larger value of $\gamma$.
FedORA: Resource Allocation for Federated Learning in ORAN using Radio Intelligent Controllers
This work proposes an integrated approach for optimising Federated Learning (FL) communication in dynamic and heterogeneous network environments. Leveraging the modular flexibility of the Open Radio Access Network (ORAN) architecture and multiple Radio Access Technologies (RATs), we aim to enhance data transmission efficiency and mitigate client-server communication constraints within the FL framework. Our system employs a two-stage optimisation strategy using ORAN's rApps and xApps. In the first stage, Reinforcement Learning (RL) based rApp is used to dynamically select each user's optimal Radio Access Technology (RAT), balancing energy efficiency with network performance. In the second stage, a model-based xApp facilitates near-real-time resource allocation optimisation through predefined policies to achieve optimal network performance. The dynamic RAT selection and resource allocation capabilities enabled by ORAN and multi-RAT contribute to robust communication resilience in dynamic network environments. Our approach demonstrates competitive performance with low power consumption compared to other state-of-the-art models, showcasing its potential for real-time applications demanding both accuracy and efficiency. This robust and comprehensive framework, enabling clients to utilise available resources effectively, highlights the potential for scalable, collaborative learning applications prioritising energy efficiency and network performance.
comment: poster in the 2025 EuCNC & 6G Summit
An Autocovariance Least-Squares-Based Data-Driven Kalman Filter for Unknown Systems
This article investigates the problem of data-driven state estimation for linear systems with both unknown system dynamics and noise covariances. We propose an Autocovariance Least-squares-based Data-driven Kalman Filter (ADKF), which provides a unified framework for simultaneous system identification and state estimation by utilizing pre-collected input-output trajectories and estimated initial states. Specifically, we design a SDP-based algorithm for estimating the noise covariances. We quantify the impact of model inaccuracy on noise covariances estimation using this identification algorithm, and introduce a feedback control mechanism for data collection to enhance the accuracy and stability of noise covariance estimation. The estimated noise covariances account for model inaccuracy, which are shown to be more suitable for state estimation. We also quantify the performance gap between the ADKF and the traditional Kalman filter with known system dynamics and noise covariances, showing that this gap decreases as the number and length of pre-collected trajectories increase. Finally, numerical simulations validate the robustness and effectiveness of the proposed ADKF.
Robust Stability Analysis of Positive Lure System with Neural Network Feedback
This paper investigates the robustness of the Lur'e problem under positivity constraints, drawing on results from the positive Aizerman conjecture and the robustness properties of Metzler matrices. Specifically, we consider a control system of Lur'e type in which not only the linear part includes parametric uncertainty but also the nonlinear sector bound is unknown. We investigate tools from positive linear systems to effectively solve the problems in complicated and uncertain nonlinear systems. By leveraging the positivity characteristic of the system, we derive an explicit formula for the stability radius of Lur'e systems. Furthermore, we extend our analysis to systems with neural network (NN) feedback loops. Building on this approach, we also propose a refinement method for sector bounds of feedforward neural networks (FFNNs). This study introduces a scalable and efficient approach for robustness analysis of both Lur'e and NN-controlled systems. Finally, the proposed results are supported by illustrative examples.
comment: Accepted at the 9th IEEE Conference on Control Technology and Applications (CCTA) 2025, San Diego, California
Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios
In collaborative human-robot environments, the unpredictable and dynamic nature of human motion can lead to situations where collisions become unavoidable. In such cases, it is essential for the robotic system to proactively mitigate potential harm through intelligent control strategies. This paper presents a hierarchical control framework based on Control Barrier Functions (CBFs) designed to ensure safe and adaptive operation of autonomous robotic manipulators during close-proximity human-robot interaction. The proposed method introduces a relaxation variable that enables real-time prioritization of safety constraints, allowing the robot to dynamically manage collision risks based on the criticality of different parts of the human body. A secondary constraint mechanism is incorporated to resolve infeasibility by increasing the priority of imminent threats. The framework is experimentally validated on a Franka Research 3 robot equipped with a ZED2i AI camera for real-time human pose and body detection. Experimental results confirm that the CBF-based controller, integrated with depth sensing, facilitates responsive and safe human-robot collaboration, while providing detailed risk analysis and maintaining robust performance in highly dynamic settings.
Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions
Neural Networks (NNs) have been successfully employed to represent the state evolution of complex dynamical systems. Such models, referred to as NN dynamic models (NNDMs), use iterative noisy predictions of NN to estimate a distribution of system trajectories over time. Despite their accuracy, safety analysis of NNDMs is known to be a challenging problem and remains largely unexplored. To address this issue, in this paper, we introduce a method of providing safety guarantees for NNDMs. Our approach is based on stochastic barrier functions, whose relation with safety are analogous to that of Lyapunov functions with stability. We first show a method of synthesizing stochastic barrier functions for NNDMs via a convex optimization problem, which in turn provides a lower bound on the system's safety probability. A key step in our method is the employment of the recent convex approximation results for NNs to find piece-wise linear bounds, which allow the formulation of the barrier function synthesis problem as a sum-of-squares optimization program. If the obtained safety probability is above the desired threshold, the system is certified. Otherwise, we introduce a method of generating controls for the system that robustly maximizes the safety probability in a minimally-invasive manner. We exploit the convexity property of the barrier function to formulate the optimal control synthesis problem as a linear program. Experimental results illustrate the efficacy of the method. Namely, they show that the method can scale to multi-dimensional NNDMs with multiple layers and hundreds of neurons per layer, and that the controller can significantly improve the safety probability.
Interacting Large Language Model Agents. Interpretable Models and Social Learning
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under 2 settings: (a) centrally controlled LLMAs (b) autonomous LLMAs with incentives. We demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and closed-source models like ChatGPT. The main takeaway of this paper, based on empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
comment: 41 Pages
Multi-Partite Output Regulation of Multi-Agent Systems
This article proposes a simple, graph-independent perspective on partitioning the node set of a graph and provides multi-agent systems (MASs) with objectives beyond cooperation and bipartition. Specifically, we first introduce the notion of $k$-partition transformation to achieve any desired partition of the nodes. Then, we use this notion to formulate the multi-partite output regulation problem (MORP) of heterogeneous linear MASs, which comprises the existing cooperative output regulation problem (CORP) and bipartite output regulation problem (BORP) as subcases. The goal of the MORP is to design a distributed control law such that each follower that belongs to the same set in the partition asymptotically tracks a scalar multiple of the reference while ensuring the internal stability of the closed-loop system. It is shown that the necessary and sufficient conditions for the solvability of the MORP with a feedforward-based distributed control law follow from the CORP and lead to the first design strategy for the control parameters. However, it has a drawback in terms of scalability due to a partition-dependent condition. We prove that this condition is implied by its partition-independent version under a mild structural condition. This implication yields the second design strategy that is much more scalable than the first one. Finally, an experiment is conducted to demonstrate the MORP's flexibility, and two numerical examples are provided to illustrate its generality and compare both design strategies regarding scalability.
comment: Under review
Data-Based Control of Continuous-Time Linear Systems with Performance Specifications
The design of direct data-based controllers has become a fundamental part of control theory research in the last few years. In this paper, we consider three classes of data-based state feedback control problems for linear systems. These control problems are such that, besides stabilization, some additional performance requirements must be satisfied. First, we formulate and solve a trajectory-reference control problem, on which desired closed-loop trajectories are known and a controller that allows the system to closely follow those trajectories is computed. Then, the solution of the LQR problem for continuous-time systems is presented. Finally, we consider the case in which the precise position of the desired poles of the closed-loop system is known, and introduce a data-based variant of a robust pole-placement procedure. The applicability of the proposed methods is tested using numerical simulations.
comment: 11 pages, 1 figure
Systems and Control (EESS)
Passive Vibration Control of a 3-D Printer Gantry
Improved additive manufacturing capabilities are vital for the future development and improvement of ubiquitous robotic systems. These machines can be integrated into existing robotic systems to allow manufacturing and repair of components, as well as fabrication of custom parts for the robots themselves. The fused filament fabrication (FFF) process is one of the most common and well-developed AM processes but suffers from the effects of vibration-induced position error, particularly as the printing speed is raised. This project adapted and expanded a dynamic model of an FFF gantry system to include a passive spring-mass-damper system controller attached to the extruder carriage and tuned using optimal parameters. A case study was conducted to demonstrate the effects and generate recommendations for implementation. This work is also valuable for other mechatronic systems which operate using an open-loop control system and which suffer from vibration, including numerous robotic systems, pick-and-place machines, positioners, and similar.
comment: 7 pages, 4 figures, 2 tables, 31 equations
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $\epsilon$ sub-optimality and feasible policy after $O(\epsilon^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($\gamma$) and by at least 6x for larger value of $\gamma$.
FedORA: Resource Allocation for Federated Learning in ORAN using Radio Intelligent Controllers
This work proposes an integrated approach for optimising Federated Learning (FL) communication in dynamic and heterogeneous network environments. Leveraging the modular flexibility of the Open Radio Access Network (ORAN) architecture and multiple Radio Access Technologies (RATs), we aim to enhance data transmission efficiency and mitigate client-server communication constraints within the FL framework. Our system employs a two-stage optimisation strategy using ORAN's rApps and xApps. In the first stage, Reinforcement Learning (RL) based rApp is used to dynamically select each user's optimal Radio Access Technology (RAT), balancing energy efficiency with network performance. In the second stage, a model-based xApp facilitates near-real-time resource allocation optimisation through predefined policies to achieve optimal network performance. The dynamic RAT selection and resource allocation capabilities enabled by ORAN and multi-RAT contribute to robust communication resilience in dynamic network environments. Our approach demonstrates competitive performance with low power consumption compared to other state-of-the-art models, showcasing its potential for real-time applications demanding both accuracy and efficiency. This robust and comprehensive framework, enabling clients to utilise available resources effectively, highlights the potential for scalable, collaborative learning applications prioritising energy efficiency and network performance.
comment: poster in the 2025 EuCNC & 6G Summit
An Autocovariance Least-Squares-Based Data-Driven Kalman Filter for Unknown Systems
This article investigates the problem of data-driven state estimation for linear systems with both unknown system dynamics and noise covariances. We propose an Autocovariance Least-squares-based Data-driven Kalman Filter (ADKF), which provides a unified framework for simultaneous system identification and state estimation by utilizing pre-collected input-output trajectories and estimated initial states. Specifically, we design a SDP-based algorithm for estimating the noise covariances. We quantify the impact of model inaccuracy on noise covariances estimation using this identification algorithm, and introduce a feedback control mechanism for data collection to enhance the accuracy and stability of noise covariance estimation. The estimated noise covariances account for model inaccuracy, which are shown to be more suitable for state estimation. We also quantify the performance gap between the ADKF and the traditional Kalman filter with known system dynamics and noise covariances, showing that this gap decreases as the number and length of pre-collected trajectories increase. Finally, numerical simulations validate the robustness and effectiveness of the proposed ADKF.
Robust Stability Analysis of Positive Lure System with Neural Network Feedback
This paper investigates the robustness of the Lur'e problem under positivity constraints, drawing on results from the positive Aizerman conjecture and the robustness properties of Metzler matrices. Specifically, we consider a control system of Lur'e type in which not only the linear part includes parametric uncertainty but also the nonlinear sector bound is unknown. We investigate tools from positive linear systems to effectively solve the problems in complicated and uncertain nonlinear systems. By leveraging the positivity characteristic of the system, we derive an explicit formula for the stability radius of Lur'e systems. Furthermore, we extend our analysis to systems with neural network (NN) feedback loops. Building on this approach, we also propose a refinement method for sector bounds of feedforward neural networks (FFNNs). This study introduces a scalable and efficient approach for robustness analysis of both Lur'e and NN-controlled systems. Finally, the proposed results are supported by illustrative examples.
comment: Accepted at the 9th IEEE Conference on Control Technology and Applications (CCTA) 2025, San Diego, California
Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios
In collaborative human-robot environments, the unpredictable and dynamic nature of human motion can lead to situations where collisions become unavoidable. In such cases, it is essential for the robotic system to proactively mitigate potential harm through intelligent control strategies. This paper presents a hierarchical control framework based on Control Barrier Functions (CBFs) designed to ensure safe and adaptive operation of autonomous robotic manipulators during close-proximity human-robot interaction. The proposed method introduces a relaxation variable that enables real-time prioritization of safety constraints, allowing the robot to dynamically manage collision risks based on the criticality of different parts of the human body. A secondary constraint mechanism is incorporated to resolve infeasibility by increasing the priority of imminent threats. The framework is experimentally validated on a Franka Research 3 robot equipped with a ZED2i AI camera for real-time human pose and body detection. Experimental results confirm that the CBF-based controller, integrated with depth sensing, facilitates responsive and safe human-robot collaboration, while providing detailed risk analysis and maintaining robust performance in highly dynamic settings.
Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions
Neural Networks (NNs) have been successfully employed to represent the state evolution of complex dynamical systems. Such models, referred to as NN dynamic models (NNDMs), use iterative noisy predictions of NN to estimate a distribution of system trajectories over time. Despite their accuracy, safety analysis of NNDMs is known to be a challenging problem and remains largely unexplored. To address this issue, in this paper, we introduce a method of providing safety guarantees for NNDMs. Our approach is based on stochastic barrier functions, whose relation with safety are analogous to that of Lyapunov functions with stability. We first show a method of synthesizing stochastic barrier functions for NNDMs via a convex optimization problem, which in turn provides a lower bound on the system's safety probability. A key step in our method is the employment of the recent convex approximation results for NNs to find piece-wise linear bounds, which allow the formulation of the barrier function synthesis problem as a sum-of-squares optimization program. If the obtained safety probability is above the desired threshold, the system is certified. Otherwise, we introduce a method of generating controls for the system that robustly maximizes the safety probability in a minimally-invasive manner. We exploit the convexity property of the barrier function to formulate the optimal control synthesis problem as a linear program. Experimental results illustrate the efficacy of the method. Namely, they show that the method can scale to multi-dimensional NNDMs with multiple layers and hundreds of neurons per layer, and that the controller can significantly improve the safety probability.
Interacting Large Language Model Agents. Interpretable Models and Social Learning
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under 2 settings: (a) centrally controlled LLMAs (b) autonomous LLMAs with incentives. We demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and closed-source models like ChatGPT. The main takeaway of this paper, based on empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
comment: 41 Pages
Multi-Partite Output Regulation of Multi-Agent Systems
This article proposes a simple, graph-independent perspective on partitioning the node set of a graph and provides multi-agent systems (MASs) with objectives beyond cooperation and bipartition. Specifically, we first introduce the notion of $k$-partition transformation to achieve any desired partition of the nodes. Then, we use this notion to formulate the multi-partite output regulation problem (MORP) of heterogeneous linear MASs, which comprises the existing cooperative output regulation problem (CORP) and bipartite output regulation problem (BORP) as subcases. The goal of the MORP is to design a distributed control law such that each follower that belongs to the same set in the partition asymptotically tracks a scalar multiple of the reference while ensuring the internal stability of the closed-loop system. It is shown that the necessary and sufficient conditions for the solvability of the MORP with a feedforward-based distributed control law follow from the CORP and lead to the first design strategy for the control parameters. However, it has a drawback in terms of scalability due to a partition-dependent condition. We prove that this condition is implied by its partition-independent version under a mild structural condition. This implication yields the second design strategy that is much more scalable than the first one. Finally, an experiment is conducted to demonstrate the MORP's flexibility, and two numerical examples are provided to illustrate its generality and compare both design strategies regarding scalability.
comment: Under review
Data-Based Control of Continuous-Time Linear Systems with Performance Specifications
The design of direct data-based controllers has become a fundamental part of control theory research in the last few years. In this paper, we consider three classes of data-based state feedback control problems for linear systems. These control problems are such that, besides stabilization, some additional performance requirements must be satisfied. First, we formulate and solve a trajectory-reference control problem, on which desired closed-loop trajectories are known and a controller that allows the system to closely follow those trajectories is computed. Then, the solution of the LQR problem for continuous-time systems is presented. Finally, we consider the case in which the precise position of the desired poles of the closed-loop system is known, and introduce a data-based variant of a robust pole-placement procedure. The applicability of the proposed methods is tested using numerical simulations.
comment: 11 pages, 1 figure
Robotics
Beyond Domain Randomization: Event-Inspired Perception for Visually Robust Adversarial Imitation from Videos
Imitation from videos often fails when expert demonstrations and learner environments exhibit domain shifts, such as discrepancies in lighting, color, or texture. While visual randomization partially addresses this problem by augmenting training data, it remains computationally intensive and inherently reactive, struggling with unseen scenarios. We propose a different approach: instead of randomizing appearances, we eliminate their influence entirely by rethinking the sensory representation itself. Inspired by biological vision systems that prioritize temporal transients (e.g., retinal ganglion cells) and by recent sensor advancements, we introduce event-inspired perception for visually robust imitation. Our method converts standard RGB videos into a sparse, event-based representation that encodes temporal intensity gradients, discarding static appearance features. This biologically grounded approach disentangles motion dynamics from visual style, enabling robust visual imitation from observations even in the presence of visual mismatches between expert and agent environments. By training policies on event streams, we achieve invariance to appearance-based distractors without requiring computationally expensive and environment-specific data augmentation techniques. Experiments across the DeepMind Control Suite and the Adroit platform for dynamic dexterous manipulation show the efficacy of our method. Our code is publicly available at Eb-LAIfO.
SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes
We present the Semantics-aware Dataset and Benchmark Generation Pipeline for Open-vocabulary Object Navigation in Dynamic Scenes (SD-OVON). It utilizes pretraining multimodal foundation models to generate infinite unique photo-realistic scene variants that adhere to real-world semantics and daily commonsense for the training and the evaluation of navigation agents, accompanied with a plugin for generating object navigation task episodes compatible to the Habitat simulator. In addition, we offer two pre-generated object navigation task datasets, SD-OVON-3k and SD-OVON-10k, comprising respectively about 3k and 10k episodes of the open-vocabulary object navigation task, derived from the SD-OVON-Scenes dataset with 2.5k photo-realistic scans of real-world environments and the SD-OVON-Objects dataset with 0.9k manually inspected scanned and artist-created manipulatable object models. Unlike prior datasets limited to static environments, SD-OVON covers dynamic scenes and manipulatable objects, facilitating both real-to-sim and sim-to-real robotic applications. This approach enhances the realism of navigation tasks, the training and the evaluation of open-vocabulary object navigation agents in complex settings. To demonstrate the effectiveness of our pipeline and datasets, we propose two baselines and evaluate them along with state-of-the-art baselines on SD-OVON-3k. The datasets, benchmark and source code are publicly available.
comment: Preprint. 21 pages
DiffusionRL: Efficient Training of Diffusion Policies for Robotic Grasping Using RL-Adapted Large-Scale Datasets
Diffusion models have been successfully applied in areas such as image, video, and audio generation. Recent works show their promise for sequential decision-making and dexterous manipulation, leveraging their ability to model complex action distributions. However, challenges persist due to the data limitations and scenario-specific adaptation needs. In this paper, we address these challenges by proposing an optimized approach to training diffusion policies using large, pre-built datasets that are enhanced using Reinforcement Learning (RL). Our end-to-end pipeline leverages RL-based enhancement of the DexGraspNet dataset, lightweight diffusion policy training on a dexterous manipulation task for a five-fingered robotic hand, and a pose sampling algorithm for validation. The pipeline achieved a high success rate of 80% for three DexGraspNet objects. By eliminating manual data collection, our approach lowers barriers to adopting diffusion models in robotics, enhancing generalization and robustness for real-world applications.
comment: Submitted to CoRL 2025
Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Safety stands as the primary obstacle preventing the widespread adoption of learning-based robotic systems in our daily lives. While reinforcement learning (RL) shows promise as an effective robot learning paradigm, conventional RL frameworks often model safety by using single scalar negative rewards with immediate episode termination, failing to capture the temporal consequences of unsafe actions (e.g., sustained collision damage). In this work, we introduce a novel approach that simulates these temporal effects by applying continuous negative rewards without episode termination. Our experiments reveal that standard RL methods struggle with this model, as the accumulated negative values in unsafe zones create learning barriers. To address this challenge, we demonstrate how Control Barrier Functions (CBFs), with their proven safety guarantees, effectively help robots avoid catastrophic regions while enhancing learning outcomes. We present three CBF-based approaches, each integrating traditional RL methods with Control Barrier Functions, guiding the agent to learn safe behavior. Our empirical analysis, conducted in both simulated environments and real-world settings using a four-wheel differential drive robot, explores the possibilities of employing these approaches for safe robotic learning.
Discrete gradient methods for port-Hamiltonian differential-algebraic equations
Discrete gradient methods are a powerful tool for the time discretization of dynamical systems, since they are structure-preserving regardless of the form of the total energy. In this work, we discuss the application of discrete gradient methods to the system class of nonlinear port-Hamiltonian differential-algebraic equations - as they emerge from the port- and energy-based modeling of physical systems in various domains. We introduce a novel numerical scheme tailored for semi-explicit differential-algebraic equations and further address more general settings using the concepts of discrete gradient pairs and Dirac-dissipative structures. Additionally, the behavior under system transformations is investigated and we demonstrate that under suitable assumptions port-Hamiltonian differential-algebraic equations admit a representation which consists of a parametrized port-Hamiltonian semi-explicit system and an unstructured equation. Finally, we present the application to multibody system dynamics and discuss numerical results to demonstrate the capabilities of our approach.
comment: 36 pages (8 pages appendix), 6 figures
Distributed Expectation Propagation for Multi-Object Tracking over Sensor Networks
In this paper, we present a novel distributed expectation propagation algorithm for multiple sensors, multiple objects tracking in cluttered environments. The proposed framework enables each sensor to operate locally while collaboratively exchanging moment estimates with other sensors, thus eliminating the need to transmit all data to a central processing node. Specifically, we introduce a fast and parallelisable Rao-Blackwellised Gibbs sampling scheme to approximate the tilted distributions, which enhances the accuracy and efficiency of expectation propagation updates. Results demonstrate that the proposed algorithm improves both communication and inference efficiency for multi-object tracking tasks with dynamic sensor connectivity and varying clutter levels.
Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance
While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, passive data collection via human teleoperation is costly, hard to scale, and often biased toward passive demonstrations with limited diversity. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance. When the robot execution failures occur, GCENT enables the system revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary, enabling scalable supervision. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.
On the Dual-Use Dilemma in Physical Reasoning and Force
Humans learn how and when to apply forces in the world via a complex physiological and psychological learning process. Attempting to replicate this in vision-language models (VLMs) presents two challenges: VLMs can produce harmful behavior, which is particularly dangerous for VLM-controlled robots which interact with the world, but imposing behavioral safeguards can limit their functional and ethical extents. We conduct two case studies on safeguarding VLMs which generate forceful robotic motion, finding that safeguards reduce both harmful and helpful behavior involving contact-rich manipulation of human body parts. Then, we discuss the key implication of this result--that value alignment may impede desirable robot capabilities--for model evaluation and robot learning.
One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion
Humanoid locomotion faces a critical scalability challenge: traditional reinforcement learning (RL) methods require task-specific rewards and struggle to leverage growing datasets, even as more training terrains are introduced. We propose DreamPolicy, a unified framework that enables a single policy to master diverse terrains and generalize zero-shot to unseen scenarios by systematically integrating offline data and diffusion-driven motion synthesis. At its core, DreamPolicy introduces Humanoid Motion Imagery (HMI) - future state predictions synthesized through an autoregressive terrain-aware diffusion planner curated by aggregating rollouts from specialized policies across various distinct terrains. Unlike human motion datasets requiring laborious retargeting, our data directly captures humanoid kinematics, enabling the diffusion planner to synthesize "dreamed" trajectories that encode terrain-specific physical constraints. These trajectories act as dynamic objectives for our HMI-conditioned policy, bypassing manual reward engineering and enabling cross-terrain generalization. DreamPolicy addresses the scalability limitations of prior methods: while traditional RL fails to exploit growing datasets, our framework scales seamlessly with more offline data. As the dataset expands, the diffusion prior learns richer locomotion skills, which the policy leverages to master new terrains without retraining. Experiments demonstrate that DreamPolicy achieves average 90% success rates in training environments and an average of 20% higher success on unseen terrains than the prevalent method. It also generalizes to perturbed and composite scenarios where prior approaches collapse. By unifying offline data, diffusion-based trajectory synthesis, and policy optimization, DreamPolicy overcomes the "one task, one policy" bottleneck, establishing a paradigm for scalable, data-driven humanoid control.
Mobile Manipulation Planning for Tabletop Rearrangement
Efficient tabletop rearrangement planning seeks to find high-quality solutions while minimizing total cost. However, the task is challenging due to object dependencies and limited buffer space for temporary placements. The complexity increases for mobile robots, which must navigate around the table with restricted access. A*-based methods yield high-quality solutions, but struggle to scale as the number of objects increases. Monte Carlo Tree Search (MCTS) has been introduced as an anytime algorithm, but its convergence speed to high-quality solutions remains slow. Previous work~\cite{strap2024} accelerated convergence but required the robot to move to the closest position to the object for each pick and place operation, leading to inefficiencies. To address these limitations, we extend the planner by introducing a more efficient strategy for mobile robots. Instead of selecting the nearest available location for each action, our approach allows multiple operations (e.g., pick-and-place) from a single standing position, reducing unnecessary movement. Additionally, we incorporate state re-exploration to further improve plan quality. Experimental results show that our planner outperforms existing planners both in terms of solution quality and planning time.
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Recent high-capacity vision-language-action (VLA) models have demonstrated impressive performance on a range of robotic manipulation tasks by imitating human demonstrations. However, exploiting offline data with limited visited states will cause execution failure in out-of-distribution scenarios. Intuitively, an exploration-based method that improves on online collected data at test time could address this limitation. We present VLA-RL, an algorithmic and systematic framework that leverages online reinforcement learning (RL) to improve pretrained auto-regressive VLAs in downstream tasks. Within a unified perspective, we first introduce a trajectory-level RL formulation for auto-regressive VLA training, which models general robotic manipulation trajectory as multi-modal multi-turn conversation. To address the challenge of sparse rewards, we fine-tune a pretrained vision-language model as a robotic process reward model, which is trained on pseudo reward labels annotated on automatically extracted task segments. To scale up, we identify several implementation findings that improve the stability and efficiency including curriculum selection strategy, GPU-balanced vectorized environments, batch decoding, and critic warmup. VLA-RL enables OpenVLA-7B to surpass the strongest finetuned baseline by 4.5% on 40 challenging robotic manipulation tasks in LIBERO, and even matches the performance of advanced commercial models such as $\pi_0$-FAST. Notably, we observe that VLA-RL benefits from increased test-time optimization, indicating an early spark of inference scaling laws in robotics.
YOPO-Rally: A Sim-to-Real Single-Stage Planner for Off-Road Terrain
Off-road navigation remains challenging for autonomous robots due to the harsh terrain and clustered obstacles. In this letter, we extend the YOPO (You Only Plan Once) end-to-end navigation framework to off-road environments, explicitly focusing on forest terrains, consisting of a high-performance, multi-sensor supported off-road simulator YOPO-Sim, a zero-shot transfer sim-to-real planner YOPO-Rally, and an MPC controller. Built on the Unity engine, the simulator can generate randomized forest environments and export depth images and point cloud maps for expert demonstrations, providing competitive performance with mainstream simulators. Terrain Traversability Analysis (TTA) processes cost maps, generating expert trajectories represented as non-uniform cubic Hermite curves. The planner integrates TTA and the pathfinding into a single neural network that inputs the depth image, current velocity, and the goal vector, and outputs multiple trajectory candidates with costs. The planner is trained by behavior cloning in the simulator and deployed directly into the real-world without fine-tuning. Finally, a series of simulated and real-world experiments is conducted to validate the performance of the proposed framework.
comment: 8 pages, 8 figures
Coordinated guidance and control for multiple parafoil system landing
Multiple parafoil landing is an enabling technology for massive supply delivery missions. However, it is still an open question to design a collision-free, computation-efficient guidance and control method for unpowered parafoils. To address this issue, this paper proposes a coordinated guidance and control method for multiple parafoil landing. First, the multiple parafoil landing process is formulated as a trajectory optimization problem. Then, the landing point allocation algorithm is designed to assign the landing point to each parafoil. In order to guarantee flight safety, the collision-free trajectory replanning algorithm is designed. On this basis, the nonlinear model predictive control algorithm is adapted to leverage the nonlinear dynamics model for trajectory tracking. Finally, the parafoil kinematic model is utilized to reduce the computational burden of trajectory calculation, and kinematic model is updated by the moving horizon correction algorithm to improve the trajectory accuracy. Simulation results demonstrate the effectiveness and computational efficiency of the proposed coordinated guidance and control method for the multiple parafoil landing.
Supporting Preschool Emotional Development with AI-Powered Robots
This study evaluates the integration of AI-powered robots in early childhood education, focusing on their impact on emotional self-regulation, engagement, and collaborative skills. A ten-week experimental design involving two groups of children assessed the robot's effectiveness through progress assessments, parental surveys, and teacher feedback. Results demonstrated that early exposure to the robot significantly enhanced emotional recognition, while sustained interaction further improved collaborative and social engagement. Parental and teacher feedback highlighted high acceptance levels, emphasizing the robot's ease of integration and positive influence on classroom dynamics. This research underscores the transformative potential of AI and robotics in education. The findings advocate for the broader adoption of AI-powered interventions, carefully examining equitable access, ethical considerations, and sustainable implementation. This work sets a foundation for exploring long-term impacts and expanding applications of AI in inclusive and impactful educational settings.
comment: This is the preprint version of a paper accepted at the 24th ACM Interaction Design and Children Conference, IDC 2025. The final published version will be available via ACM Digital Library
S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving
Safety is a long-standing and the final pursuit in the development of autonomous driving systems, with a significant portion of safety challenge arising from perception. How to effectively evaluate the safety as well as the reliability of perception algorithms is becoming an emerging issue. Despite its critical importance, existing perception methods exhibit a limitation in their robustness, primarily due to the use of benchmarks are entierly simulated, which fail to align predicted results with actual outcomes, particularly under extreme weather conditions and sensor anomalies that are prevalent in real-world scenarios. To fill this gap, in this study, we propose a Sim-to-Real Evaluation Benchmark for Autonomous Driving (S2R-Bench). We collect diverse sensor anomaly data under various road conditions to evaluate the robustness of autonomous driving perception methods in a comprehensive and realistic manner. This is the first corruption robustness benchmark based on real-world scenarios, encompassing various road conditions, weather conditions, lighting intensities, and time periods. By comparing real-world data with simulated data, we demonstrate the reliability and practical significance of the collected data for real-world applications. We hope that this dataset will advance future research and contribute to the development of more robust perception models for autonomous driving. This dataset is released on https://github.com/adept-thu/S2R-Bench.
Optimization-Based Trajectory Planning for Tractor-Trailer Vehicles on Curvy Roads: A Progressively Increasing Sampling Number Method
In this work, we propose an optimization-based trajectory planner for tractor-trailer vehicles on curvy roads. The lack of analytical expression for the trailer's errors to the center line pose a great challenge to the trajectory planning for tractor-trailer vehicles. To address this issue, we first use geometric representations to characterize the lateral and orientation errors in Cartesian frame, where the errors would serve as the components of the cost function and the road edge constraints within our optimization process. Next, we generate a coarse trajectory to warm-start the subsequent optimization problems. On the other hand, to achieve a good approximation of the continuous-time kinematics, optimization-based methods usually discretize the kinematics with a large sampling number. This leads to an increase in the number of the variables and constraints, thus making the optimization problem difficult to solve. To address this issue, we design a Progressively Increasing Sampling Number Optimization (PISNO) framework. More specifically, we first find a nearly feasible trajectory with a small sampling number to warm-start the optimization process. Then, the sampling number is progressively increased, and the corresponding intermediate Optimal Control Problem (OCP) is solved in each iteration. Next, we further resample the obtained solution into a finer sampling period, and then use it to warm-start the intermediate OCP in next iteration. This process is repeated until reaching a threshold sampling number. Simulation and experiment results show the proposed method exhibits a good performance and less computational consumption over the benchmarks.
Applying Ontologies and Knowledge Augmented Large Language Models to Industrial Automation: A Decision-Making Guidance for Achieving Human-Robot Collaboration in Industry 5.0
The rapid advancement of Large Language Models (LLMs) has resulted in interest in their potential applications within manufacturing systems, particularly in the context of Industry 5.0. However, determining when to implement LLMs versus other Natural Language Processing (NLP) techniques, ontologies or knowledge graphs, remains an open question. This paper offers decision-making guidance for selecting the most suitable technique in various industrial contexts, emphasizing human-robot collaboration and resilience in manufacturing. We examine the origins and unique strengths of LLMs, ontologies, and knowledge graphs, assessing their effectiveness across different industrial scenarios based on the number of domains or disciplines required to bring a product from design to manufacture. Through this comparative framework, we explore specific use cases where LLMs could enhance robotics for human-robot collaboration, while underscoring the continued relevance of ontologies and knowledge graphs in low-dependency or resource-constrained sectors. Additionally, we address the practical challenges of deploying these technologies, such as computational cost and interpretability, providing a roadmap for manufacturers to navigate the evolving landscape of Language based AI tools in Industry 5.0. Our findings offer a foundation for informed decision-making, helping industry professionals optimize the use of Language Based models for sustainable, resilient, and human-centric manufacturing. We also propose a Large Knowledge Language Model architecture that offers the potential for transparency and configuration based on complexity of task and computing resources available.
An Inertial Sequence Learning Framework for Vehicle Speed Estimation via Smartphone IMU
Accurately estimating vehicle velocity via smartphone is critical for mobile navigation and transportation. This paper introduces a cutting-edge framework for velocity estimation that incorporates temporal learning models, utilizing Inertial Measurement Unit (IMU) data and is supervised by Global Navigation Satellite System (GNSS) information. The framework employs a noise compensation network to fit the noise distribution between sensor measurements and actual motion, and a pose estimation network to align the coordinate systems of the phone and the vehicle. To enhance the model's generalizability, a data augmentation technique that mimics various phone placements within the car is proposed. Moreover, a new loss function is designed to mitigate timestamp mismatches between GNSS and IMU signals, effectively aligning the signals and improving the velocity estimation accuracy. Finally, we implement a highly efficient prototype and conduct extensive experiments on a real-world crowdsourcing dataset, resulting in superior accuracy and efficiency.
Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
Learning effective visual representations for robotic manipulation remains a fundamental challenge due to the complex body dynamics involved in action execution. In this paper, we study how visual representations that carry body-relevant cues can enable efficient policy learning for downstream robotic manipulation tasks. We present $\textbf{I}$nter-token $\textbf{Con}$trast ($\textbf{ICon}$), a contrastive learning method applied to the token-level representations of Vision Transformers (ViTs). ICon enforces a separation in the feature space between agent-specific and environment-specific tokens, resulting in agent-centric visual representations that embed body-specific inductive biases. This framework can be seamlessly integrated into end-to-end policy learning by incorporating the contrastive loss as an auxiliary objective. Our experiments show that ICon not only improves policy performance across various manipulation tasks but also facilitates policy transfer across different robots. The project website: https://github.com/HenryWJL/icon
comment: A preprint version
Canonical Policy: Learning Canonical 3D Representation for Equivariant Policy
Visual Imitation learning has achieved remarkable progress in robotic manipulation, yet generalization to unseen objects, scene layouts, and camera viewpoints remains a key challenge. Recent advances address this by using 3D point clouds, which provide geometry-aware, appearance-invariant representations, and by incorporating equivariance into policy architectures to exploit spatial symmetries. However, existing equivariant approaches often lack interpretability and rigor due to unstructured integration of equivariant components. We introduce canonical policy, a principled framework for 3D equivariant imitation learning that unifies 3D point cloud observations under a canonical representation. We first establish a theory of 3D canonical representations, enabling equivariant observation-to-action mappings by grouping both in-distribution and out-of-distribution point clouds to a canonical representation. We then propose a flexible policy learning pipeline that leverages geometric symmetries from canonical representation and the expressiveness of modern generative models. We validate canonical policy on 12 diverse simulated tasks and 4 real-world manipulation tasks across 16 configurations, involving variations in object color, shape, camera viewpoint, and robot platform. Compared to state-of-the-art imitation learning policies, canonical policy achieves an average improvement of 18.0% in simulation and 37.6% in real-world experiments, demonstrating superior generalization capability and sample efficiency. For more details, please refer to the project website: https://zhangzhiyuanzhang.github.io/cp-website/.
ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning
Supervised visuomotor policies have shown strong performance in robotic manipulation but often struggle in tasks with limited visual input, such as operations in confined spaces, dimly lit environments, or scenarios where perceiving the object's properties and state is critical for task success. In such cases, tactile feedback becomes essential for manipulation. While the rapid progress of supervised visuomotor policies has benefited greatly from high-quality, reproducible simulation benchmarks in visual imitation, the visuotactile domain still lacks a similarly comprehensive and reliable benchmark for large-scale and rigorous evaluation. To address this, we introduce ManiFeel, a reproducible and scalable simulation benchmark for studying supervised visuotactile manipulation policies across a diverse set of tasks and scenarios. ManiFeel presents a comprehensive benchmark suite spanning a diverse set of manipulation tasks, evaluating various policies, input modalities, and tactile representation methods. Through extensive experiments, our analysis reveals key factors that influence supervised visuotactile policy learning, identifies the types of tasks where tactile sensing is most beneficial, and highlights promising directions for future research in visuotactile policy learning. ManiFeel aims to establish a reproducible benchmark for supervised visuotactile policy learning, supporting progress in visuotactile manipulation and perception. To facilitate future research and ensure reproducibility, we will release our codebase, datasets, training logs, and pretrained checkpoints. Please visit the project website for more details: https://zhengtongxu.github.io/manifeel-website/
Curio: A Cost-Effective Solution for Robotics Education
Student engagement is one of the key challenges in robotics and artificial intelligence (AI) education. Tangible learning approaches, such as educational robots, provide an effective way to enhance engagement and learning by offering real-world applications to bridge the gap between theory and practice. However, existing platforms often face barriers such as high cost or limited capabilities. In this paper, we present Curio, a cost-effective, smartphone-integrated robotics platform designed to lower the entry barrier to robotics and AI education. With a retail price below $50, Curio is more affordable than similar platforms. By leveraging smartphones, Curio eliminates the need for onboard processing units, dedicated cameras, and additional sensors while maintaining the ability to perform AI-based tasks. To evaluate the impact of Curio on student engagement, we conducted a case study with 20 participants, where we examined usability, engagement, and potential for integrating into AI and robotics education. The results indicate high engagement and motivation levels across all participants. Additionally, 95% of participants reported an improvement in their understanding of robotics. Findings suggest that using a robotic system such as Curio can enhance engagement and hands-on learning in robotics and AI education. All resources and projects with Curio are available at trycurio.com.
ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models
Large Vision-Language Models (LVLMs) have recently advanced robotic manipulation by leveraging vision for scene perception and language for instruction following. However, existing methods rely heavily on costly human-annotated training datasets, which limits their generalization and causes them to struggle in out-of-domain (OOD) scenarios, reducing real-world adaptability. To address these challenges, we propose ManipLVM-R1, a novel reinforcement learning framework that replaces traditional supervision with Reinforcement Learning using Verifiable Rewards (RLVR). By directly optimizing for task-aligned outcomes, our method enhances generalization and physical reasoning while removing the dependence on costly annotations. Specifically, we design two rule-based reward functions targeting key robotic manipulation subtasks: an Affordance Perception Reward to enhance localization of interaction regions, and a Trajectory Match Reward to ensure the physical plausibility of action paths. These rewards provide immediate feedback and impose spatial-logical constraints, encouraging the model to go beyond shallow pattern matching and instead learn deeper, more systematic reasoning about physical interactions.
comment: 14pages
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
The generalization capabilities of vision-language-action (VLA) models to unseen tasks are crucial to achieving general-purpose robotic manipulation in open-world settings. However, the cross-task generalization capabilities of existing VLA models remain significantly underexplored. To address this gap, we introduce AGNOSTOS, a novel simulation benchmark designed to rigorously evaluate cross-task zero-shot generalization in manipulation. AGNOSTOS comprises 23 unseen manipulation tasks for testing, distinct from common training task distributions, and incorporates two levels of generalization difficulty to assess robustness. Our systematic evaluation reveals that current VLA models, despite being trained on diverse datasets, struggle to generalize effectively to these unseen tasks. To overcome this limitation, we propose Cross-Task In-Context Manipulation (X-ICM), a method that conditions large language models (LLMs) on in-context demonstrations from seen tasks to predict action sequences for unseen tasks. Additionally, we introduce a dynamics-guided sample selection strategy that identifies relevant demonstrations by capturing cross-task dynamics. On AGNOSTOS, X-ICM significantly improves cross-task zero-shot generalization performance over leading VLAs. We believe AGNOSTOS and X-ICM will serve as valuable tools for advancing general-purpose robotic manipulation.
comment: Project Page: https://jiaming-zhou.github.io/AGNOSTOS
RoboCulture: A Robotics Platform for Automated Biological Experimentation
Automating biological experimentation remains challenging due to the need for millimeter-scale precision, long and multi-step experiments, and the dynamic nature of living systems. Current liquid handlers only partially automate workflows, requiring human intervention for plate loading, tip replacement, and calibration. Industrial solutions offer more automation but are costly and lack the flexibility needed in research settings. Meanwhile, research in autonomous robotics has yet to bridge the gap for long-duration, failure-sensitive biological experiments. We introduce RoboCulture, a cost-effective and flexible platform that uses a general-purpose robotic manipulator to automate key biological tasks. RoboCulture performs liquid handling, interacts with lab equipment, and leverages computer vision for real-time decisions using optical density-based growth monitoring. We demonstrate a fully autonomous 15-hour yeast culture experiment where RoboCulture uses vision and force feedback and a modular behavior tree framework to robustly execute, monitor, and manage experiments. Video demonstrations of RoboCulture can be found at https://ac-rad.github.io/roboculture.
GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets
Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GRoQ-LoCO, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors - stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) - collected across multiple quadruped robots, to train a generalist model that enables behavior fusion. Crucially, our framework operates solely on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results demonstrate the potential of offline, data-driven learning to generalize locomotion across diverse quadruped morphologies and behaviors.
comment: 18pages, 16figures, 6tables
Value Gradients with Action Adaptive Search Trees in Continuous (PO)MDPs
Solving Partially Observable Markov Decision Processes (POMDPs) in continuous state, action and observation spaces is key for autonomous planning in many real-world mobility and robotics applications. Current approaches are mostly sample based, and cannot hope to reach near-optimal solutions in reasonable time. We propose two complementary theoretical contributions. First, we formulate a novel Multiple Importance Sampling (MIS) tree for value estimation, that allows to share value information between sibling action branches. The novel MIS tree supports action updates during search time, such as gradient-based updates. Second, we propose a novel methodology to compute value gradients with online sampling based on transition likelihoods. It is applicable to MDPs, and we extend it to POMDPs via particle beliefs with the application of the propagated belief trick. The gradient estimator is computed in practice using the MIS tree with efficient Monte Carlo sampling. These two parts are combined into a new planning algorithm Action Gradient Monte Carlo Tree Search (AGMCTS). We demonstrate in a simulated environment its applicability, advantages over continuous online POMDP solvers that rely solely on sampling, and we discuss further implications.
Simplifying Complex Observation Models in Continuous POMDP Planning with Probabilistic Guarantees and Practice
Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.
AGI-Elo: How Far Are We From Mastering A Task?
As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unlike existing metrics that focus solely on models, our approach allows for fine-grained, difficulty-aware evaluations through competitive interactions between models and tasks, capturing both the long-tail distribution of real-world challenges and the competency gap between current models and full task mastery. We validate the generalizability and robustness of our system through extensive experiments on multiple established datasets and models across distinct AGI domains. The resulting rating distributions offer novel perspectives and interpretable insights into task difficulty, model progression, and the outstanding challenges that remain on the path to achieving full AGI task mastery.
LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes ICRA
We present LiDAR-EDIT, a novel paradigm for generating synthetic LiDAR data for autonomous driving. Our framework edits real-world LiDAR scans by introducing new object layouts while preserving the realism of the background environment. Compared to end-to-end frameworks that generate LiDAR point clouds from scratch, LiDAR-EDIT offers users full control over the object layout, including the number, type, and pose of objects, while keeping most of the original real-world background. Our method also provides object labels for the generated data. Compared to novel view synthesis techniques, our framework allows for the creation of counterfactual scenarios with object layouts significantly different from the original real-world scene. LiDAR-EDIT uses spherical voxelization to enforce correct LiDAR projective geometry in the generated point clouds by construction. During object removal and insertion, generative models are employed to fill the unseen background and object parts that were occluded in the original real LiDAR scans. Experimental results demonstrate that our framework produces realistic LiDAR scans with practical value for downstream tasks.
comment: Accepted to IEEE International Conference on Robotics and Automation (ICRA). 6 pages, 7 figures
MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.
ASGrasp: Generalizable Transparent Object Reconstruction and 6-DoF Grasp Detection from RGB-D Active Stereo Camera ICRA
In this paper, we tackle the problem of grasping transparent and specular objects. This issue holds importance, yet it remains unsolved within the field of robotics due to failure of recover their accurate geometry by depth cameras. For the first time, we propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera. ASGrasp utilizes a two-layer learning-based stereo network for the purpose of transparent object reconstruction, enabling material-agnostic object grasping in cluttered environments. In contrast to existing RGB-D based grasp detection methods, which heavily depend on depth restoration networks and the quality of depth maps generated by depth cameras, our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction. We create an extensive synthetic dataset through domain randomization, which is based on GraspNet-1Billion. Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping in both simulation and the real via seamless sim-to-real transfer. Our method significantly outperforms SOTA networks and even surpasses the performance upper bound set by perfect visible point cloud inputs.Project page: https://pku-epic.github.io/ASGrasp
comment: IEEE International Conference on Robotics and Automation (ICRA), 2024
Multiagent Systems
Coordinated guidance and control for multiple parafoil system landing
Multiple parafoil landing is an enabling technology for massive supply delivery missions. However, it is still an open question to design a collision-free, computation-efficient guidance and control method for unpowered parafoils. To address this issue, this paper proposes a coordinated guidance and control method for multiple parafoil landing. First, the multiple parafoil landing process is formulated as a trajectory optimization problem. Then, the landing point allocation algorithm is designed to assign the landing point to each parafoil. In order to guarantee flight safety, the collision-free trajectory replanning algorithm is designed. On this basis, the nonlinear model predictive control algorithm is adapted to leverage the nonlinear dynamics model for trajectory tracking. Finally, the parafoil kinematic model is utilized to reduce the computational burden of trajectory calculation, and kinematic model is updated by the moving horizon correction algorithm to improve the trajectory accuracy. Simulation results demonstrate the effectiveness and computational efficiency of the proposed coordinated guidance and control method for the multiple parafoil landing.
DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation
Large Language Models (LLMs) demonstrate strong generalization and reasoning abilities, making them well-suited for complex decision-making tasks such as medical consultation (MC). However, existing LLM-based methods often fail to capture the dual nature of MC, which entails two distinct sub-tasks: symptom inquiry, a sequential decision-making process, and disease diagnosis, a classification problem. This mismatch often results in ineffective symptom inquiry and unreliable disease diagnosis. To address this, we propose \textbf{DDO}, a novel LLM-based framework that performs \textbf{D}ual-\textbf{D}ecision \textbf{O}ptimization by decoupling and independently optimizing the the two sub-tasks through a collaborative multi-agent workflow. Experiments on three real-world MC datasets show that DDO consistently outperforms existing LLM-based approaches and achieves competitive performance with state-of-the-art generation-based methods, demonstrating its effectiveness in the MC task.
comment: 17 pages, 4 figures
MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations
We study offline imitation learning (IL) in cooperative multi-agent settings, where demonstrations have unlabeled mixed quality - containing both expert and suboptimal trajectories. Our proposed solution is structured in two stages: trajectory labeling and multi-agent imitation learning, designed jointly to enable effective learning from heterogeneous, unlabeled data. In the first stage, we combine advances in large language models and preference-based reinforcement learning to construct a progressive labeling pipeline that distinguishes expert-quality trajectories. In the second stage, we introduce MisoDICE, a novel multi-agent IL algorithm that leverages these labels to learn robust policies while addressing the computational complexity of large joint state-action spaces. By extending the popular single-agent DICE framework to multi-agent settings with a new value decomposition and mixing architecture, our method yields a convex policy optimization objective and ensures consistency between global and local policies. We evaluate MisoDICE on multiple standard multi-agent RL benchmarks and demonstrate superior performance, especially when expert data is scarce.
MASTER: Multi-Agent Security Through Exploration of Roles and Topological Structures -- A Comprehensive Framework
Large Language Models (LLMs)-based Multi-Agent Systems (MAS) exhibit remarkable problem-solving and task planning capabilities across diverse domains due to their specialized agentic roles and collaborative interactions. However, this also amplifies the severity of security risks under MAS attacks. To address this, we introduce MASTER, a novel security research framework for MAS, focusing on diverse Role configurations and Topological structures across various scenarios. MASTER offers an automated construction process for different MAS setups and an information-flow-based interaction paradigm. To tackle MAS security challenges in varied scenarios, we design a scenario-adaptive, extensible attack strategy utilizing role and topological information, which dynamically allocates targeted, domain-specific attack tasks for collaborative agent execution. Our experiments demonstrate that such an attack, leveraging role and topological information, exhibits significant destructive potential across most models. Additionally, we propose corresponding defense strategies, substantially enhancing MAS resilience across diverse scenarios. We anticipate that our framework and findings will provide valuable insights for future research into MAS security challenges.
MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs
Medical Large Vision-Language Models (Med-LVLMs) have been widely adopted for medical report generation. Despite Med-LVLMs producing state-of-the-art performance, they exhibit a bias toward predicting all findings as normal, leading to reports that overlook critical abnormalities. Furthermore, these models often fail to provide comprehensive descriptions of radiologically relevant regions necessary for accurate diagnosis. To address these challenges, we proposeMedical Report Generation Agents (MRGAgents), a novel multi-agent framework that fine-tunes specialized agents for different disease categories. By curating subsets of the IU X-ray and MIMIC-CXR datasets to train disease-specific agents, MRGAgents generates reports that more effectively balance normal and abnormal findings while ensuring a comprehensive description of clinically relevant regions. Our experiments demonstrate that MRGAgents outperformed the state-of-the-art, improving both report comprehensiveness and diagnostic utility.
comment: 10pages
EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks
This paper introduces EdgeAgentX, a novel framework integrating federated learning (FL), multi-agent reinforcement learning (MARL), and adversarial defense mechanisms, tailored for military communication networks. EdgeAgentX significantly improves autonomous decision-making, reduces latency, enhances throughput, and robustly withstands adversarial disruptions, as evidenced by comprehensive simulations.
comment: 6 pages, 2 figures
Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning
Actor-critic methods for decentralized multi-agent reinforcement learning (MARL) facilitate collaborative optimal decision making without centralized coordination, thus enabling a wide range of applications in practice. To date, however, most theoretical convergence studies for existing actor-critic decentralized MARL methods are limited to the guarantee of a stationary solution under the linear function approximation. This leaves a significant gap between the highly successful use of deep neural actor-critic for decentralized MARL in practice and the current theoretical understanding. To bridge this gap, in this paper, we make the first attempt to develop a deep neural actor-critic method for decentralized MARL, where both the actor and critic components are inherently non-linear. We show that our proposed method enjoys a global optimality guarantee with a finite-time convergence rate of O(1/T), where T is the total iteration times. This marks the first global convergence result for deep neural actor-critic methods in the MARL literature. We also conduct extensive numerical experiments, which verify our theoretical results.
Distributed Set-membership Filtering Frameworks For Multi-agent Systems With Absolute and Relative Measurements
In this paper, we focus on the distributed set-membership filtering (SMFing) problem for a multi-agent system with absolute (taken from agents themselves) and relative (taken from neighbors) measurements. In the literature, the relative measurements are difficult to deal with, and the SMFs highly rely on specific set descriptions. As a result, establishing the general distributed SMFing framework having relative measurements is still an open problem. To solve this problem, first, we provide the set description based on uncertain variables determined by the relative measurements between two agents as the foundation. Surprisingly, the accurate description requires only a single calculation step rather than multiple iterations, which can effectively reduce computational complexity. Based on the derived set description, called the uncertain range, we propose two distributed SMFing frameworks: one calculates the joint uncertain range of the agent itself and its neighbors, while the other only computes the marginal uncertain range of each local system. Furthermore, we compare the performance of our proposed two distributed SMFing frameworks and the benchmark -- centralized SMFing framework. A rigorous set analysis reveals that the distributed SMF can be essentially considered as the process of computing the marginal uncertain range to outer bound the projection of the uncertain range obtained by the centralized SMF in the corresponding subspace. Simulation results corroborate the effectiveness of our proposed distributed frameworks and verify our theoretical analysis.
comment: The derivation of equation 4 -6 is not fully correct, which is the foundation of marginal based framework. Thus, we would like to withdraw this version
Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi AAMAS
The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, partial observability, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of "rules" or principles. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting an agent's action space using conventions, which act as a sequence of special cooperative actions that span over and include multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play for various number of cooperators within Hanabi.
comment: This paper is accepted and published in the journal of autonomous agents and multi-agent systems (JAAMAS). The updated manuscript is the revised version after the final round of peer revision
An Identity Based Agent Model for Value Alignment
Social identities play an important role in the dynamics of human societies, and it can be argued that some sense of identification with a larger cause or idea plays a critical role in making humans act responsibly. Often social activists strive to get populations to identify with some cause or notion -- like green energy, diversity, etc. in order to bring about desired social changes. We explore the problem of designing computational models for social identities in the context of autonomous AI agents. For this, we propose an agent model that enables agents to identify with certain notions and show how this affects collective outcomes. We also contrast between associations of identity with rational preferences. The proposed model is simulated in an application context of urban mobility, where we show how changes in social identity affect mobility patterns and collective outcomes.
Group Trip Planning Query Problem with Multimodal Journey
In Group Trip Planning (GTP) Query Problem, we are given a city road network where a number of Points of Interest (PoI) have been marked with their respective categories (e.g., Cafeteria, Park, Movie Theater, etc.). A group of agents want to visit one PoI from every category from their respective starting location and once finished, they want to reach their respective destinations. This problem asks which PoI from every category should be chosen so that the aggregated travel cost of the group is minimized. This problem has been studied extensively in the last decade, and several solution approaches have been proposed. However, to the best of our knowledge, none of the existing studies have considered the different modalities of the journey, which makes the problem more practical. To bridge this gap, we introduce and study the GTP Query Problem with Multimodal Journey in this paper. Along with the other inputs of the GTP Query Problem, we are also given the different modalities of the journey that are available and their respective cost. Now, the problem is not only to select the PoIs from respective categories but also to select the modality of the journey. For this problem, we have proposed an efficient solution approach, which has been analyzed to understand their time and space requirements. A large number of experiments have been conducted using real-life datasets and the results have been reported. From the results, we observe that the PoIs and modality of journey recommended by the proposed solution approach lead to much less time and cost than the baseline methods.
comment: 11 Pages
TextArena
TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.
comment: Work in progress; 5 pages, 3 figures
Systems and Control (CS)
Three-Dimensional Nonlinear Guidance with Impact Time and Field-of-view Constraints
This paper addresses the time-constrained interception of targets at a predetermined time with bounded field-of-view capability of the seeker-equipped interceptors. We propose guidance laws using the effective lead angle and velocity lead angles of the interceptor to achieve a successful interception of the target. The former scheme extends the existing two-dimensional guidance strategy to a three-dimensional setting. We have shown that such an extension may result in high-frequency switching in the input demand, which may degrade the interceptor's performance. To overcome the potential limitations of such a guidance strategy, we propose an elegant solution using the velocity lead angles and the range error with a backstepping technique. Using the velocity lead angles as virtual inputs, the effective lead angle profile is subsequently regulated to satisfy the seeker's field-of-view bound. Unlike the existing strategies, the proposed guidance strategy does not rely on the time-to-go estimate, which is an appealing feature of the design, as the time-to-go estimate may not always be available with high precision. We provide a theoretical analysis of the error variable and subsequently analytically derive the bounds on achievable impact times. Numerical simulations are performed to support the theoretical findings. The performance of the proposed guidance strategy is compared with that of an existing one, and it has been shown to yield better performance. Finally, a study on different choices of virtual inputs is also provided.
Supermartingale Certificates for Quantitative Omega-regular Verification and Control
We present the first supermartingale certificate for quantitative $\omega$-regular properties of discrete-time infinite-state stochastic systems. Our certificate is defined on the product of the stochastic system and a limit-deterministic B\"uchi automaton that specifies the property of interest; hence we call it a limit-deterministic B\"uchi supermartingale (LDBSM). Previously known supermartingale certificates applied only to quantitative reachability, safety, or reach-avoid properties, and to qualitative (i.e., probability 1) $\omega$-regular properties. We also present fully automated algorithms for the template-based synthesis of LDBSMs, for the case when the stochastic system dynamics and the controller can be represented in terms of polynomial inequalities. Our experiments demonstrate the ability of our method to solve verification and control tasks for stochastic systems that were beyond the reach of previous supermartingale-based approaches.
comment: Accepted at CAV 2025
Discrete gradient methods for port-Hamiltonian differential-algebraic equations
Discrete gradient methods are a powerful tool for the time discretization of dynamical systems, since they are structure-preserving regardless of the form of the total energy. In this work, we discuss the application of discrete gradient methods to the system class of nonlinear port-Hamiltonian differential-algebraic equations - as they emerge from the port- and energy-based modeling of physical systems in various domains. We introduce a novel numerical scheme tailored for semi-explicit differential-algebraic equations and further address more general settings using the concepts of discrete gradient pairs and Dirac-dissipative structures. Additionally, the behavior under system transformations is investigated and we demonstrate that under suitable assumptions port-Hamiltonian differential-algebraic equations admit a representation which consists of a parametrized port-Hamiltonian semi-explicit system and an unstructured equation. Finally, we present the application to multibody system dynamics and discuss numerical results to demonstrate the capabilities of our approach.
comment: 36 pages (8 pages appendix), 6 figures
Agent-Based Decentralized Energy Management of EV Charging Station with Solar Photovoltaics via Multi-Agent Reinforcement Learning SC2
In the pursuit of energy net zero within smart cities, transportation electrification plays a pivotal role. The adoption of Electric Vehicles (EVs) keeps increasing, making energy management of EV charging stations critically important. While previous studies have managed to reduce energy cost of EV charging while maintaining grid stability, they often overlook the robustness of EV charging management against uncertainties of various forms, such as varying charging behaviors and possible faults in faults in some chargers. To address the gap, a novel Multi-Agent Reinforcement Learning (MARL) approach is proposed treating each charger to be an agent and coordinate all the agents in the EV charging station with solar photovoltaics in a more realistic scenario, where system faults may occur. A Long Short-Term Memory (LSTM) network is incorporated in the MARL algorithm to extract temporal features from time-series. Additionally, a dense reward mechanism is designed for training the agents in the MARL algorithm to improve EV charging experience. Through validation on a real-world dataset, we show that our approach is robust against system uncertainties and faults and also effective in minimizing EV charging costs and maximizing charging service satisfaction.
comment: 2024 IEEE International Smart Cities Conference (ISC2)
Optimization-Based Trajectory Planning for Tractor-Trailer Vehicles on Curvy Roads: A Progressively Increasing Sampling Number Method
In this work, we propose an optimization-based trajectory planner for tractor-trailer vehicles on curvy roads. The lack of analytical expression for the trailer's errors to the center line pose a great challenge to the trajectory planning for tractor-trailer vehicles. To address this issue, we first use geometric representations to characterize the lateral and orientation errors in Cartesian frame, where the errors would serve as the components of the cost function and the road edge constraints within our optimization process. Next, we generate a coarse trajectory to warm-start the subsequent optimization problems. On the other hand, to achieve a good approximation of the continuous-time kinematics, optimization-based methods usually discretize the kinematics with a large sampling number. This leads to an increase in the number of the variables and constraints, thus making the optimization problem difficult to solve. To address this issue, we design a Progressively Increasing Sampling Number Optimization (PISNO) framework. More specifically, we first find a nearly feasible trajectory with a small sampling number to warm-start the optimization process. Then, the sampling number is progressively increased, and the corresponding intermediate Optimal Control Problem (OCP) is solved in each iteration. Next, we further resample the obtained solution into a finer sampling period, and then use it to warm-start the intermediate OCP in next iteration. This process is repeated until reaching a threshold sampling number. Simulation and experiment results show the proposed method exhibits a good performance and less computational consumption over the benchmarks.
A DSP-Free Carrier Phase Recovery System using 16-Offset-QAM Laser Forwarded Links for 400Gb/s and Beyond
Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been demonstrated, achieving higher data rates and energy efficiencies requires high-order coherent modulations like 16-QAM. Current coherent links rely on energy-intensive digital signal processing (DSP) for channel impairment compensation and carrier phase recovery (CPR), which consumes approximately 50pJ/b - 10x higher than future intra-data center requirements. For shorter links, simpler or DSP-free CPR methods can significantly reduce power and complexity. While Costas loops enable CPR for QPSK, they face challenges in scaling to higher-order modulations (e.g., 16/64-QAM) due to varying symbol amplitudes. In this work, we propose an optical coherent link architecture using laser forwarding and a novel DSP-free CPR system using offset-QAM modulation. The proposed analog CPR feedback loop is highly scalable, capable of supporting arbitrary offset-QAM modulations without requiring architectural modifications. This scalability is achieved through its phase error detection mechanism, which operates independently of the data rate and modulation type. We validated this method using GlobalFoundry's monolithic 45nm silicon photonics PDK models, with circuit- and system-level implementation at 100GBaud in the O-band. We will investigate the feedback loop dynamics, circuit-level implementations, and phase-noise performance of the proposed CPR loop. Our method can be adopted to realize low-power QAM optical interconnects for future coherent-lite pluggable transceivers as well as co-packaged optics (CPO) applications.
A Unifying Perspective for Safety of Stochastic Systems: From Barrier Functions to Finite Abstractions
Providing safety guarantees for stochastic dynamical systems is a central problem in various fields, including control theory, machine learning, and robotics. Existing methods either employ Stochastic Barrier Functions (SBFs) or rely on numerical approaches based on finite abstractions. SBFs, analogous to Lyapunov functions, are used to establish (probabilistic) set invariance, whereas abstraction-based approaches approximate the stochastic system with a finite model to compute safety probability bounds. This paper presents a unifying perspective on these seemingly different approaches. Specifically, we show that both methods can be interpreted as approximations of a stochastic dynamic programming problem. This perspective allows us to formally establish the correctness of both techniques, characterize their convergence and optimality properties, and analyze their respective assumptions, advantages, and limitations. Our analysis reveals that, unlike SBFs-based methods, abstraction-based approaches can provide asymptotically optimal safety certificates, albeit at the cost of increased computational effort.
comment: Submitted to IEEE Transaction on Automatic Control
A Real-World Energy Management Dataset from a Smart Company Building for Optimization and Machine Learning
We present a large real-world dataset obtained from monitoring a smart company facility over the course of six years, from 2018 to 2023. The dataset includes energy consumption data from various facility areas and components, energy production data from a photovoltaic system and a combined heat and power plant, operational data from heating and cooling systems, and weather data from an on-site weather station. The measurement sensors installed throughout the facility are organized in a hierarchical metering structure with multiple sub-metering levels, which is reflected in the dataset. The dataset contains measurement data from 72 energy meters, 9 heat meters and a weather station. Both raw and processed data at different processing levels, including labeled issues, is available. In this paper, we describe the data acquisition and post-processing employed to create the dataset. The dataset enables the application of a wide range of methods in the domain of energy management, including optimization, modeling, and machine learning to optimize building operations and reduce costs and carbon emissions.
comment: 19 pages, 9 figures
Dual Acceleration for Minimax Optimization: Linear Convergence Under Relaxed Assumptions
This paper addresses the bilinearly coupled minimax optimization problem: $\min_{x \in \mathbb{R}^{d_x}}\max_{y \in \mathbb{R}^{d_y}} \ f_1(x) + f_2(x) + y^{\top} Bx - g_1(y) - g_2(y)$, where $f_1$ and $g_1$ are smooth convex functions, $f_2$ and $g_2$ are potentially nonsmooth convex functions, and $B$ is a coupling matrix. Existing algorithms for solving this problem achieve linear convergence only under stronger conditions, which may not be met in many scenarios. We first introduce the Primal-Dual Proximal Gradient (PDPG) method and demonstrate that it converges linearly under an assumption where existing algorithms fail to achieve linear convergence. Building on insights gained from analyzing the convergence conditions of existing algorithms and PDPG, we further propose the inexact Dual Accelerated Proximal Gradient (iDAPG) method. This method achieves linear convergence under weaker conditions than those required by existing approaches. Moreover, even in cases where existing methods guarantee linear convergence, iDAPG can still provide superior theoretical performance in certain scenarios.
Data-Driven Subsynchronous Oscillation Suppression for Renewable Energy Integrated Power Systems Based on Koopman Operator
Recently, subsynchronous oscillations (SSOs) have emerged frequently worldwide, with the high penetration of renewable power generation in modern power systems. The SSO introduced by renewables has become a prominent new stability problem, seriously threatening the stable operation of systems. This paper proposes a data-driven dynamic optimal controller for renewable energy integrated power systems, to suppress SSOs with the control of renewables. The challenges of the controller design are the nonlinearity, complexity and hard accessibility of the system models. Using Koopman operator, the system dynamics are accurately extracted from data and utilized to the linear model predictive control (MPC). Firstly, the globally linear representation of the system dynamics is obtained by lifting, and the key states are selected as control signals by analyzing Koopman participation factors. Subsequently, augmented with the control term, the Koopman linear parameter-varying predictor of the controlled system is constructed. Finally, using MPC, the proposed controller computes control signals online in a moving horizon fashion. Case studies show that the proposed controller is effective, adaptive and robust in various conditions, surpassing other controllers with reliable control performance.
Systems and Control (EESS)
Three-Dimensional Nonlinear Guidance with Impact Time and Field-of-view Constraints
This paper addresses the time-constrained interception of targets at a predetermined time with bounded field-of-view capability of the seeker-equipped interceptors. We propose guidance laws using the effective lead angle and velocity lead angles of the interceptor to achieve a successful interception of the target. The former scheme extends the existing two-dimensional guidance strategy to a three-dimensional setting. We have shown that such an extension may result in high-frequency switching in the input demand, which may degrade the interceptor's performance. To overcome the potential limitations of such a guidance strategy, we propose an elegant solution using the velocity lead angles and the range error with a backstepping technique. Using the velocity lead angles as virtual inputs, the effective lead angle profile is subsequently regulated to satisfy the seeker's field-of-view bound. Unlike the existing strategies, the proposed guidance strategy does not rely on the time-to-go estimate, which is an appealing feature of the design, as the time-to-go estimate may not always be available with high precision. We provide a theoretical analysis of the error variable and subsequently analytically derive the bounds on achievable impact times. Numerical simulations are performed to support the theoretical findings. The performance of the proposed guidance strategy is compared with that of an existing one, and it has been shown to yield better performance. Finally, a study on different choices of virtual inputs is also provided.
Supermartingale Certificates for Quantitative Omega-regular Verification and Control
We present the first supermartingale certificate for quantitative $\omega$-regular properties of discrete-time infinite-state stochastic systems. Our certificate is defined on the product of the stochastic system and a limit-deterministic B\"uchi automaton that specifies the property of interest; hence we call it a limit-deterministic B\"uchi supermartingale (LDBSM). Previously known supermartingale certificates applied only to quantitative reachability, safety, or reach-avoid properties, and to qualitative (i.e., probability 1) $\omega$-regular properties. We also present fully automated algorithms for the template-based synthesis of LDBSMs, for the case when the stochastic system dynamics and the controller can be represented in terms of polynomial inequalities. Our experiments demonstrate the ability of our method to solve verification and control tasks for stochastic systems that were beyond the reach of previous supermartingale-based approaches.
comment: Accepted at CAV 2025
Discrete gradient methods for port-Hamiltonian differential-algebraic equations
Discrete gradient methods are a powerful tool for the time discretization of dynamical systems, since they are structure-preserving regardless of the form of the total energy. In this work, we discuss the application of discrete gradient methods to the system class of nonlinear port-Hamiltonian differential-algebraic equations - as they emerge from the port- and energy-based modeling of physical systems in various domains. We introduce a novel numerical scheme tailored for semi-explicit differential-algebraic equations and further address more general settings using the concepts of discrete gradient pairs and Dirac-dissipative structures. Additionally, the behavior under system transformations is investigated and we demonstrate that under suitable assumptions port-Hamiltonian differential-algebraic equations admit a representation which consists of a parametrized port-Hamiltonian semi-explicit system and an unstructured equation. Finally, we present the application to multibody system dynamics and discuss numerical results to demonstrate the capabilities of our approach.
comment: 36 pages (8 pages appendix), 6 figures
Agent-Based Decentralized Energy Management of EV Charging Station with Solar Photovoltaics via Multi-Agent Reinforcement Learning SC2
In the pursuit of energy net zero within smart cities, transportation electrification plays a pivotal role. The adoption of Electric Vehicles (EVs) keeps increasing, making energy management of EV charging stations critically important. While previous studies have managed to reduce energy cost of EV charging while maintaining grid stability, they often overlook the robustness of EV charging management against uncertainties of various forms, such as varying charging behaviors and possible faults in faults in some chargers. To address the gap, a novel Multi-Agent Reinforcement Learning (MARL) approach is proposed treating each charger to be an agent and coordinate all the agents in the EV charging station with solar photovoltaics in a more realistic scenario, where system faults may occur. A Long Short-Term Memory (LSTM) network is incorporated in the MARL algorithm to extract temporal features from time-series. Additionally, a dense reward mechanism is designed for training the agents in the MARL algorithm to improve EV charging experience. Through validation on a real-world dataset, we show that our approach is robust against system uncertainties and faults and also effective in minimizing EV charging costs and maximizing charging service satisfaction.
comment: 2024 IEEE International Smart Cities Conference (ISC2)
Optimization-Based Trajectory Planning for Tractor-Trailer Vehicles on Curvy Roads: A Progressively Increasing Sampling Number Method
In this work, we propose an optimization-based trajectory planner for tractor-trailer vehicles on curvy roads. The lack of analytical expression for the trailer's errors to the center line pose a great challenge to the trajectory planning for tractor-trailer vehicles. To address this issue, we first use geometric representations to characterize the lateral and orientation errors in Cartesian frame, where the errors would serve as the components of the cost function and the road edge constraints within our optimization process. Next, we generate a coarse trajectory to warm-start the subsequent optimization problems. On the other hand, to achieve a good approximation of the continuous-time kinematics, optimization-based methods usually discretize the kinematics with a large sampling number. This leads to an increase in the number of the variables and constraints, thus making the optimization problem difficult to solve. To address this issue, we design a Progressively Increasing Sampling Number Optimization (PISNO) framework. More specifically, we first find a nearly feasible trajectory with a small sampling number to warm-start the optimization process. Then, the sampling number is progressively increased, and the corresponding intermediate Optimal Control Problem (OCP) is solved in each iteration. Next, we further resample the obtained solution into a finer sampling period, and then use it to warm-start the intermediate OCP in next iteration. This process is repeated until reaching a threshold sampling number. Simulation and experiment results show the proposed method exhibits a good performance and less computational consumption over the benchmarks.
A DSP-Free Carrier Phase Recovery System using 16-Offset-QAM Laser Forwarded Links for 400Gb/s and Beyond
Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been demonstrated, achieving higher data rates and energy efficiencies requires high-order coherent modulations like 16-QAM. Current coherent links rely on energy-intensive digital signal processing (DSP) for channel impairment compensation and carrier phase recovery (CPR), which consumes approximately 50pJ/b - 10x higher than future intra-data center requirements. For shorter links, simpler or DSP-free CPR methods can significantly reduce power and complexity. While Costas loops enable CPR for QPSK, they face challenges in scaling to higher-order modulations (e.g., 16/64-QAM) due to varying symbol amplitudes. In this work, we propose an optical coherent link architecture using laser forwarding and a novel DSP-free CPR system using offset-QAM modulation. The proposed analog CPR feedback loop is highly scalable, capable of supporting arbitrary offset-QAM modulations without requiring architectural modifications. This scalability is achieved through its phase error detection mechanism, which operates independently of the data rate and modulation type. We validated this method using GlobalFoundry's monolithic 45nm silicon photonics PDK models, with circuit- and system-level implementation at 100GBaud in the O-band. We will investigate the feedback loop dynamics, circuit-level implementations, and phase-noise performance of the proposed CPR loop. Our method can be adopted to realize low-power QAM optical interconnects for future coherent-lite pluggable transceivers as well as co-packaged optics (CPO) applications.
A Unifying Perspective for Safety of Stochastic Systems: From Barrier Functions to Finite Abstractions
Providing safety guarantees for stochastic dynamical systems is a central problem in various fields, including control theory, machine learning, and robotics. Existing methods either employ Stochastic Barrier Functions (SBFs) or rely on numerical approaches based on finite abstractions. SBFs, analogous to Lyapunov functions, are used to establish (probabilistic) set invariance, whereas abstraction-based approaches approximate the stochastic system with a finite model to compute safety probability bounds. This paper presents a unifying perspective on these seemingly different approaches. Specifically, we show that both methods can be interpreted as approximations of a stochastic dynamic programming problem. This perspective allows us to formally establish the correctness of both techniques, characterize their convergence and optimality properties, and analyze their respective assumptions, advantages, and limitations. Our analysis reveals that, unlike SBFs-based methods, abstraction-based approaches can provide asymptotically optimal safety certificates, albeit at the cost of increased computational effort.
comment: Submitted to IEEE Transaction on Automatic Control
A Real-World Energy Management Dataset from a Smart Company Building for Optimization and Machine Learning
We present a large real-world dataset obtained from monitoring a smart company facility over the course of six years, from 2018 to 2023. The dataset includes energy consumption data from various facility areas and components, energy production data from a photovoltaic system and a combined heat and power plant, operational data from heating and cooling systems, and weather data from an on-site weather station. The measurement sensors installed throughout the facility are organized in a hierarchical metering structure with multiple sub-metering levels, which is reflected in the dataset. The dataset contains measurement data from 72 energy meters, 9 heat meters and a weather station. Both raw and processed data at different processing levels, including labeled issues, is available. In this paper, we describe the data acquisition and post-processing employed to create the dataset. The dataset enables the application of a wide range of methods in the domain of energy management, including optimization, modeling, and machine learning to optimize building operations and reduce costs and carbon emissions.
comment: 19 pages, 9 figures
Dual Acceleration for Minimax Optimization: Linear Convergence Under Relaxed Assumptions
This paper addresses the bilinearly coupled minimax optimization problem: $\min_{x \in \mathbb{R}^{d_x}}\max_{y \in \mathbb{R}^{d_y}} \ f_1(x) + f_2(x) + y^{\top} Bx - g_1(y) - g_2(y)$, where $f_1$ and $g_1$ are smooth convex functions, $f_2$ and $g_2$ are potentially nonsmooth convex functions, and $B$ is a coupling matrix. Existing algorithms for solving this problem achieve linear convergence only under stronger conditions, which may not be met in many scenarios. We first introduce the Primal-Dual Proximal Gradient (PDPG) method and demonstrate that it converges linearly under an assumption where existing algorithms fail to achieve linear convergence. Building on insights gained from analyzing the convergence conditions of existing algorithms and PDPG, we further propose the inexact Dual Accelerated Proximal Gradient (iDAPG) method. This method achieves linear convergence under weaker conditions than those required by existing approaches. Moreover, even in cases where existing methods guarantee linear convergence, iDAPG can still provide superior theoretical performance in certain scenarios.
Data-Driven Subsynchronous Oscillation Suppression for Renewable Energy Integrated Power Systems Based on Koopman Operator
Recently, subsynchronous oscillations (SSOs) have emerged frequently worldwide, with the high penetration of renewable power generation in modern power systems. The SSO introduced by renewables has become a prominent new stability problem, seriously threatening the stable operation of systems. This paper proposes a data-driven dynamic optimal controller for renewable energy integrated power systems, to suppress SSOs with the control of renewables. The challenges of the controller design are the nonlinearity, complexity and hard accessibility of the system models. Using Koopman operator, the system dynamics are accurately extracted from data and utilized to the linear model predictive control (MPC). Firstly, the globally linear representation of the system dynamics is obtained by lifting, and the key states are selected as control signals by analyzing Koopman participation factors. Subsequently, augmented with the control term, the Koopman linear parameter-varying predictor of the controlled system is constructed. Finally, using MPC, the proposed controller computes control signals online in a moving horizon fashion. Case studies show that the proposed controller is effective, adaptive and robust in various conditions, surpassing other controllers with reliable control performance.
Robotics
Rotational Multi-material 3D Printing of Soft Robotic Matter with Asymmetrical Embedded Pneumatics
The rapid design and fabrication of soft robotic matter is of growing interest for shape morphing, actuation, and wearable devices. Here, we report a facile fabrication method for creating soft robotic materials with embedded pneumatics that exhibit programmable shape morphing behavior. Using rotational multi-material 3D printing, asymmetrical core-shell filaments composed of elastomeric shells and fugitive cores are patterned in 1D and 2D motifs. By precisely controlling the nozzle design, rotation rate, and print path, one can control the local orientation, shape, and cross-sectional area of the patterned fugitive core along each printed filament. Once the elastomeric matrix is cured, the fugitive cores are removed, leaving behind embedded conduits that facilitate pneumatic actuation. Using a connected Fermat spirals pathing approach, one can automatically generate desired print paths required for more complex soft robots, such as hand-inspired grippers. Our integrated design and printing approach enables one to rapidly build soft robotic matter that exhibits myriad shape morphing transitions on demand.
comment: 33 pages, 5 main text figures, 4 supporting text figures, 10 movies (movie repository link - https://drive.google.com/drive/folders/11SKVM-4viqbc-MqxsXlotrn01nusKMkY?usp=sharing)
What Do You Need for Diverse Trajectory Stitching in Diffusion Planning?
In planning, stitching is an ability of algorithms to piece together sub-trajectories of data they are trained on to generate new and diverse behaviours. While stitching is historically a strength of offline reinforcement learning, recent generative behavioural cloning (BC) methods have also shown proficiency at stitching. However, the main factors behind this are poorly understood, hindering the development of new algorithms that can reliably stitch. Focusing on diffusion planners trained via BC, we find two properties are needed to compose: \emph{positional equivariance} and \emph{local receptiveness}. We use these two properties to explain architecture, data, and inference choices in existing generative BC methods based on diffusion planning, including replanning frequency, data augmentation, and data scaling. Experimental comparisions show that (1) while locality is more important than positional equivariance in creating a diffusion planner capable of composition, both are crucial (2) enabling these properties through relatively simple architecture choices can be competitive with more computationally expensive methods such as replanning or scaling data, and (3) simple inpainting-based guidance can guide architecturally compositional models to enable generalization in goal-conditioned settings.
comment: 9 Pages
Linear Mixture Distributionally Robust Markov Decision Processes
Many real-world decision-making problems face the off-dynamics challenge: the agent learns a policy in a source domain and deploys it in a target domain with different state transitions. The distributionally robust Markov decision process (DRMDP) addresses this challenge by finding a robust policy that performs well under the worst-case environment within a pre-specified uncertainty set of transition dynamics. Its effectiveness heavily hinges on the proper design of these uncertainty sets, based on prior knowledge of the dynamics. In this work, we propose a novel linear mixture DRMDP framework, where the nominal dynamics is assumed to be a linear mixture model. In contrast with existing uncertainty sets directly defined as a ball centered around the nominal kernel, linear mixture DRMDPs define the uncertainty sets based on a ball around the mixture weighting parameter. We show that this new framework provides a more refined representation of uncertainties compared to conventional models based on $(s,a)$-rectangularity and $d$-rectangularity, when prior knowledge about the mixture model is present. We propose a meta algorithm for robust policy learning in linear mixture DRMDPs with general $f$-divergence defined uncertainty sets, and analyze its sample complexities under three divergence metrics instantiations: total variation, Kullback-Leibler, and $\chi^2$ divergences. These results establish the statistical learnability of linear mixture DRMDPs, laying the theoretical foundation for future research on this new setting.
comment: 26 pages, 7 figures
Knot So Simple: A Minimalistic Environment for Spatial Reasoning
We propose KnotGym, an interactive environment for complex, spatial reasoning and manipulation. KnotGym includes goal-oriented rope manipulation tasks with varying levels of complexity, all requiring acting from pure image observations. Tasks are defined along a clear and quantifiable axis of complexity based on the number of knot crossings, creating a natural generalization test. KnotGym has a simple observation space, allowing for scalable development, yet it highlights core challenges in integrating acute perception, spatial reasoning, and grounded manipulation. We evaluate methods of different classes, including model-based RL, model-predictive control, and chain-of-thought reasoning, and illustrate the challenges KnotGym presents. KnotGym is available at https://github.com/lil-lab/knotgym.
ExoGait-MS: Learning Periodic Dynamics with Multi-Scale Graph Network for Exoskeleton Gait Recognition
Current exoskeleton control methods often face challenges in delivering personalized treatment. Standardized walking gaits can lead to patient discomfort or even injury. Therefore, personalized gait is essential for the effectiveness of exoskeleton robots, as it directly impacts their adaptability, comfort, and rehabilitation outcomes for individual users. To enable personalized treatment in exoskeleton-assisted therapy and related applications, accurate recognition of personal gait is crucial for implementing tailored gait control. The key challenge in gait recognition lies in effectively capturing individual differences in subtle gait features caused by joint synergy, such as step frequency and step length. To tackle this issue, we propose a novel approach, which uses Multi-Scale Global Dense Graph Convolutional Networks (GCN) in the spatial domain to identify latent joint synergy patterns. Moreover, we propose a Gait Non-linear Periodic Dynamics Learning module to effectively capture the periodic characteristics of gait in the temporal domain. To support our individual gait recognition task, we have constructed a comprehensive gait dataset that ensures both completeness and reliability. Our experimental results demonstrate that our method achieves an impressive accuracy of 94.34% on this dataset, surpassing the current state-of-the-art (SOTA) by 3.77%. This advancement underscores the potential of our approach to enhance personalized gait control in exoskeleton-assisted therapy.
Classification of assembly tasks combining multiple primitive actions using Transformers and xLSTMs
The classification of human-performed assembly tasks is essential in collaborative robotics to ensure safety, anticipate robot actions, and facilitate robot learning. However, achieving reliable classification is challenging when segmenting tasks into smaller primitive actions is unfeasible, requiring us to classify long assembly tasks that encompass multiple primitive actions. In this study, we propose classifying long assembly sequential tasks based on hand landmark coordinates and compare the performance of two well-established classifiers, LSTM and Transformer, as well as a recent model, xLSTM. We used the HRC scenario proposed in the CT benchmark, which includes long assembly tasks that combine actions such as insertions, screw fastenings, and snap fittings. Testing was conducted using sequences gathered from both the human operator who performed the training sequences and three new operators. The testing results of real-padded sequences for the LSTM, Transformer, and xLSTM models was 72.9%, 95.0% and 93.2% for the training operator, and 43.5%, 54.3% and 60.8% for the new operators, respectively. The LSTM model clearly underperformed compared to the other two approaches. As expected, both the Transformer and xLSTM achieved satisfactory results for the operator they were trained on, though the xLSTM model demonstrated better generalization capabilities to new operators. The results clearly show that for this type of classification, the xLSTM model offers a slight edge over Transformers.
Is Single-View Mesh Reconstruction Ready for Robotics?
This paper evaluates single-view mesh reconstruction models for creating digital twin environments in robot manipulation. Recent advances in computer vision for 3D reconstruction from single viewpoints present a potential breakthrough for efficiently creating virtual replicas of physical environments for robotics contexts. However, their suitability for physics simulations and robotics applications remains unexplored. We establish benchmarking criteria for 3D reconstruction in robotics contexts, including handling typical inputs, producing collision-free and stable reconstructions, managing occlusions, and meeting computational constraints. Our empirical evaluation using realistic robotics datasets shows that despite success on computer vision benchmarks, existing approaches fail to meet robotics-specific requirements. We quantitively examine limitations of single-view reconstruction for practical robotics implementation, in contrast to prior work that focuses on multi-view approaches. Our findings highlight critical gaps between computer vision advances and robotics needs, guiding future research at this intersection.
comment: 20 pages, 17 figures
Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions
Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions. Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.
comment: In review
Object Classification Utilizing Neuromorphic Proprioceptive Signals in Active Exploration: Validated on a Soft Anthropomorphic Hand
Proprioception, a key sensory modality in haptic perception, plays a vital role in perceiving the 3D structure of objects by providing feedback on the position and movement of body parts. The restoration of proprioceptive sensation is crucial for enabling in-hand manipulation and natural control in the prosthetic hand. Despite its importance, proprioceptive sensation is relatively unexplored in an artificial system. In this work, we introduce a novel platform that integrates a soft anthropomorphic robot hand (QB SoftHand) with flexible proprioceptive sensors and a classifier that utilizes a hybrid spiking neural network with different types of spiking neurons to interpret neuromorphic proprioceptive signals encoded by a biological muscle spindle model. The encoding scheme and the classifier are implemented and tested on the datasets we collected in the active exploration of ten objects from the YCB benchmark. Our results indicate that the classifier achieves more accurate inferences than existing learning approaches, especially in the early stage of the exploration. This system holds the potential for development in the areas of haptic feedback and neural prosthetics.
A Bio-mimetic Neuromorphic Model for Heat-evoked Nociceptive Withdrawal Reflex in Upper Limb
The nociceptive withdrawal reflex (NWR) is a mechanism to mediate interactions and protect the body from damage in a potentially dangerous environment. To better convey warning signals to users of prosthetic arms or autonomous robots and protect them by triggering a proper NWR, it is useful to use a biological representation of temperature information for fast and effective processing. In this work, we present a neuromorphic spiking network for heat-evoked NWR by mimicking the structure and encoding scheme of the reflex arc. The network is trained with the bio-plausible reward modulated spike timing-dependent plasticity learning algorithm. We evaluated the proposed model and three other methods in recent studies that trigger NWR in an experiment with radiant heat. We found that only the neuromorphic model exhibits the spatial summation (SS) effect and temporal summation (TS) effect similar to humans and can encode the reflex strength matching the intensity of the stimulus in the relative spike latency online. The improved bio-plausibility of this neuromorphic model could improve sensory feedback in neural prostheses.
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
Safe and feasible trajectory planning is essential for real-world autonomous driving systems. However, existing learning-based planning methods often rely on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting unsafe behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task, guided by explicit planning principles such as safety, comfort, and traffic rule compliance. In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data. In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization (GRPO), a reinforcement learning strategy, to align its predictions with these planning principles. Experiments on the nuPlan benchmark demonstrate that our Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance.
H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies
We present a hierarchical policy-learning framework that enables a legged humanoid to cooperatively carry extended loads with a human partner using only haptic cues for intent inference. At the upper tier, a lightweight behavior-cloning network consumes six-axis force/torque streams from dual wrist-mounted sensors and outputs whole-body planar velocity commands that capture the leader's applied forces. At the lower tier, a deep-reinforcement-learning policy, trained under randomized payloads (0-3 kg) and friction conditions in Isaac Gym and validated in MuJoCo and on a real Unitree G1, maps these high-level twists to stable, under-load joint trajectories. By decoupling intent interpretation (force -> velocity) from legged locomotion (velocity -> joints), our method combines intuitive responsiveness to human inputs with robust, load-adaptive walking. We collect training data without motion-capture or markers, only synchronized RGB video and F/T readings, employing SAM2 and WHAM to extract 3D human pose and velocity. In real-world trials, our humanoid achieves cooperative carry-and-move performance (completion time, trajectory deviation, velocity synchrony, and follower-force) on par with a blindfolded human-follower baseline. This work is the first to demonstrate learned haptic guidance fused with full-body legged control for fluid human-humanoid co-manipulation. Code and videos are available on the H2-COMPACT website.
comment: Code and videos available at https://h2compact.github.io/h2compact/
MinkUNeXt-SI: Improving point cloud-based place recognition including spherical coordinates and LiDAR intensity
In autonomous navigation systems, the solution of the place recognition problem is crucial for their safe functioning. But this is not a trivial solution, since it must be accurate regardless of any changes in the scene, such as seasonal changes and different weather conditions, and it must be generalizable to other environments. This paper presents our method, MinkUNeXt-SI, which, starting from a LiDAR point cloud, preprocesses the input data to obtain its spherical coordinates and intensity values normalized within a range of 0 to 1 for each point, and it produces a robust place recognition descriptor. To that end, a deep learning approach that combines Minkowski convolutions and a U-net architecture with skip connections is used. The results of MinkUNeXt-SI demonstrate that this method reaches and surpasses state-of-the-art performance while it also generalizes satisfactorily to other datasets. Additionally, we showcase the capture of a custom dataset and its use in evaluating our solution, which also achieves outstanding results. Both the code of our solution and the runs of our dataset are publicly available for reproducibility purposes.
Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras
With the growing adoption of autonomous driving, the advancement of sensor technology is crucial for ensuring safety and reliable operation. Sensor fusion techniques that combine multiple sensors such as LiDAR, radar, and cameras have proven effective, but the integration of multiple devices increases both hardware complexity and cost. Therefore, developing a single sensor capable of performing multiple roles is highly desirable for cost-efficient and scalable autonomous driving systems. Event cameras have emerged as a promising solution due to their unique characteristics, including high dynamic range, low latency, and high temporal resolution. These features enable them to perform well in challenging lighting conditions, such as low-light or backlit environments. Moreover, their ability to detect fine-grained motion events makes them suitable for applications like pedestrian detection and vehicle-to-infrastructure communication via visible light. In this study, we present a method for distance estimation using a monocular event camera and a roadside LED bar. By applying a phase-only correlation technique to the event data, we achieve sub-pixel precision in detecting the spatial shift between two light sources. This enables accurate triangulation-based distance estimation without requiring stereo vision. Field experiments conducted in outdoor driving scenarios demonstrated that the proposed approach achieves over 90% success rate with less than 0.5-meter error for distances ranging from 20 to 60 meters. Future work includes extending this method to full position estimation by leveraging infrastructure such as smart poles equipped with LEDs, enabling event-camera-based vehicles to determine their own position in real time. This advancement could significantly enhance navigation accuracy, route optimization, and integration into intelligent transportation systems.
comment: 6 pages, 7 figures. To appear in IEEE Intelligent Vehicles Symposium (IV) 2025
CU-Multi: A Dataset for Multi-Robot Data Association
Multi-robot systems (MRSs) are valuable for tasks such as search and rescue due to their ability to coordinate over shared observations. A central challenge in these systems is aligning independently collected perception data across space and time, i.e., multi-robot data association. While recent advances in collaborative SLAM (C-SLAM), map merging, and inter-robot loop closure detection have significantly progressed the field, evaluation strategies still predominantly rely on splitting a single trajectory from single-robot SLAM datasets into multiple segments to simulate multiple robots. Without careful consideration to how a single trajectory is split, this approach will fail to capture realistic pose-dependent variation in observations of a scene inherent to multi-robot systems. To address this gap, we present CU-Multi, a multi-robot dataset collected over multiple days at two locations on the University of Colorado Boulder campus. Using a single robotic platform, we generate four synchronized runs with aligned start times and deliberate percentages of trajectory overlap. CU-Multi includes RGB-D, GPS with accurate geospatial heading, and semantically annotated LiDAR data. By introducing controlled variations in trajectory overlap and dense lidar annotations, CU-Multi offers a compelling alternative for evaluating methods in multi-robot data association. Instructions on accessing the dataset, support code, and the latest updates are publicly available at https://arpg.github.io/cumulti
comment: 8 pages, 6 figures, 4 tables
DTRT: Enhancing Human Intent Estimation and Role Allocation for Physical Human-Robot Collaboration
In physical Human-Robot Collaboration (pHRC), accurate human intent estimation and rational human-robot role allocation are crucial for safe and efficient assistance. Existing methods that rely on short-term motion data for intention estimation lack multi-step prediction capabilities, hindering their ability to sense intent changes and adjust human-robot assignments autonomously, resulting in potential discrepancies. To address these issues, we propose a Dual Transformer-based Robot Trajectron (DTRT) featuring a hierarchical architecture, which harnesses human-guided motion and force data to rapidly capture human intent changes, enabling accurate trajectory predictions and dynamic robot behavior adjustments for effective collaboration. Specifically, human intent estimation in DTRT uses two Transformer-based Conditional Variational Autoencoders (CVAEs), incorporating robot motion data in obstacle-free case with human-guided trajectory and force for obstacle avoidance. Additionally, Differential Cooperative Game Theory (DCGT) is employed to synthesize predictions based on human-applied forces, ensuring robot behavior align with human intention. Compared to state-of-the-art (SOTA) methods, DTRT incorporates human dynamics into long-term prediction, providing an accurate understanding of intention and enabling rational role allocation, achieving robot autonomy and maneuverability. Experiments demonstrate DTRT's accurate intent estimation and superior collaboration performance.
HEPP: Hyper-efficient Perception and Planning for High-speed Obstacle Avoidance of UAVs
High-speed obstacle avoidance of uncrewed aerial vehicles (UAVs) in cluttered environments is a significant challenge. Existing UAV planning and obstacle avoidance systems can only fly at moderate speeds or at high speeds over empty or sparse fields. In this article, we propose a hyper-efficient perception and planning system for the high-speed obstacle avoidance of UAVs. The system mainly consists of three modules: 1) A novel incremental robocentric mapping method with distance and gradient information, which takes 89.5% less time compared to existing methods. 2) A novel obstacle-aware topological path search method that generates multiple distinct paths. 3) An adaptive gradient-based high-speed trajectory generation method with a novel time pre-allocation algorithm. With these innovations, the system has an excellent real-time performance with only milliseconds latency in each iteration, taking 79.24% less time than existing methods at high speeds (15 m/s in cluttered environments), allowing UAVs to fly swiftly and avoid obstacles in cluttered environments. The planned trajectory of the UAV is close to the global optimum in both temporal and spatial domains. Finally, extensive validations in both simulation and real-world experiments demonstrate the effectiveness of our proposed system for high-speed navigation in cluttered environments.
Dynamic Manipulation of Deformable Objects in 3D: Simulation, Benchmark and Learning Strategy
Goal-conditioned dynamic manipulation is inherently challenging due to complex system dynamics and stringent task constraints, particularly in deformable object scenarios characterized by high degrees of freedom and underactuation. Prior methods often simplify the problem to low-speed or 2D settings, limiting their applicability to real-world 3D tasks. In this work, we explore 3D goal-conditioned rope manipulation as a representative challenge. To mitigate data scarcity, we introduce a novel simulation framework and benchmark grounded in reduced-order dynamics, which enables compact state representation and facilitates efficient policy learning. Building on this, we propose Dynamics Informed Diffusion Policy (DIDP), a framework that integrates imitation pretraining with physics-informed test-time adaptation. First, we design a diffusion policy that learns inverse dynamics within the reduced-order space, enabling imitation learning to move beyond na\"ive data fitting and capture the underlying physical structure. Second, we propose a physics-informed test-time adaptation scheme that imposes kinematic boundary conditions and structured dynamics priors on the diffusion process, ensuring consistency and reliability in manipulation execution. Extensive experiments validate the proposed approach, demonstrating strong performance in terms of accuracy and robustness in the learned policy.
comment: 11 pages,
Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space
Imitation learning (IL) with human demonstrations is a promising method for robotic manipulation tasks. While minimal demonstrations enable robotic action execution, achieving high success rates and generalization requires high cost, e.g., continuously adding data or incrementally conducting human-in-loop processes with complex hardware/software systems. In this paper, we rethink the state/action space of the data collection pipeline as well as the underlying factors responsible for the prediction of non-robust actions. To this end, we introduce a Hierarchical Data Collection Space (HD-Space) for robotic imitation learning, a simple data collection scheme, endowing the model to train with proactive and high-quality data. Specifically, We segment the fine manipulation task into multiple key atomic tasks from a high-level perspective and design atomic state/action spaces for human demonstrations, aiming to generate robust IL data. We conduct empirical evaluations across two simulated and five real-world long-horizon manipulation tasks and demonstrate that IL policy training with HD-Space-based data can achieve significantly enhanced policy performance. HD-Space allows the use of a small amount of demonstration data to train a more powerful policy, particularly for long-horizon manipulation tasks. We aim for HD-Space to offer insights into optimizing data quality and guiding data scaling. project page: https://hd-space-robotics.github.io.
HACL: History-Aware Curriculum Learning for Fast Locomotion
We address the problem of agile and rapid locomotion, a key characteristic of quadrupedal and bipedal robots. We present a new algorithm that maintains stability and generates high-speed trajectories by considering the temporal aspect of locomotion. Our formulation takes into account past information based on a novel history-aware curriculum Learning (HACL) algorithm. We model the history of joint velocity commands with respect to the observed linear and angular rewards using a recurrent neural net (RNN). The hidden state helps the curriculum learn the relationship between the forward linear velocity and angular velocity commands and the rewards over a given time-step. We validate our approach on the MIT Mini Cheetah,Unitree Go1, and Go2 robots in a simulated environment and on a Unitree Go1 robot in real-world scenarios. In practice, HACL achieves peak forward velocity of 6.7 m/s for a given command velocity of 7m/s and outperforms prior locomotion algorithms by nearly 20%.
McARL:Morphology-Control-Aware Reinforcement Learning for Generalizable Quadrupedal Locomotion
We present Morphology-Control-Aware Reinforcement Learning (McARL), a new approach to overcome challenges of hyperparameter tuning and transfer loss, enabling generalizable locomotion across robot morphologies. We use a morphology-conditioned policy by incorporating a randomized morphology vector, sampled from a defined morphology range, into both the actor and critic networks. This allows the policy to learn parameters that generalize to robots with similar characteristics. We demonstrate that a single policy trained on a Unitree Go1 robot using McARL can be transferred to a different morphology (e.g., Unitree Go2 robot) and can achieve zero-shot transfer velocity of up to 3.5 m/s without retraining or fine-tuning. Moreover, it achieves 6.0 m/s on the training Go1 robot and generalizes to other morphologies like A1 and Mini Cheetah. We also analyze the impact of morphology distance on transfer performance and highlight McARL's advantages over prior approaches. McARL achieves 44-150% higher transfer performance on Go2, Mini Cheetah, and A1 compared to PPO variants.
Reinforcement Learning for Ballbot Navigation in Uneven Terrain
Ballbot (i.e. Ball balancing robot) navigation usually relies on methods rooted in control theory (CT), and works that apply Reinforcement learning (RL) to the problem remain rare while generally being limited to specific subtasks (e.g. balance recovery). Unlike CT based methods, RL does not require (simplifying) assumptions about environment dynamics (e.g. the absence of slippage between the ball and the floor). In addition to this increased accuracy in modeling, RL agents can easily be conditioned on additional observations such as depth-maps without the need for explicit formulations from first principles, leading to increased adaptivity. Despite those advantages, there has been little to no investigation into the capabilities, data-efficiency and limitations of RL based methods for ballbot control and navigation. Furthermore, there is a notable absence of an open-source, RL-friendly simulator for this task. In this paper, we present an open-source ballbot simulation based on MuJoCo, and show that with appropriate conditioning on exteroceptive observations as well as reward shaping, policies learned by classical model-free RL methods are capable of effectively navigating through randomly generated uneven terrain, using a reasonable amount of data (four to five hours on a system operating at 500hz).
comment: 6 pages, 8 figures, 2 tables
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
Pre-trained Large Language Models (LLMs) have shown promise in solving planning problems but often struggle to ensure plan correctness, especially for long-horizon tasks. Meanwhile, traditional robotic task and motion planning (TAMP) frameworks address these challenges more reliably by combining high-level symbolic search with low-level motion planning. At the core of TAMP is the planning domain, an abstract world representation defined through symbolic predicates and actions. However, creating these domains typically involves substantial manual effort and domain expertise, limiting generalizability. We introduce Planning Domain Derivation with LLMs (PDDLLM), a novel approach that combines simulated physical interaction with LLM reasoning to improve planning performance. The method reduces reliance on humans by inferring planning domains from a single annotated task-execution demonstration. Unlike prior domain-inference methods that rely on partially predefined or language descriptions of planning domains, PDDLLM constructs domains entirely from scratch and automatically integrates them with low-level motion planning skills, enabling fully automated long-horizon planning. PDDLLM is evaluated on over 1,200 diverse tasks spanning nine environments and benchmarked against six LLM-based planning baselines, demonstrating superior long-horizon planning performance, lower token costs, and successful deployment on multiple physical robot platforms.
comment: 31 pages, 9 figures
Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications
Military weapon systems and command-and-control infrastructure augmented by artificial intelligence (AI) have seen rapid development and deployment in recent years. However, the sociotechnical impacts of AI on combat systems, military decision-making, and the norms of warfare have been understudied. We focus on a specific subset of lethal autonomous weapon systems (LAWS) that use AI for targeting or battlefield decisions. We refer to this subset as AI-powered lethal autonomous weapon systems (AI-LAWS) and argue that they introduce novel risks -- including unanticipated escalation, poor reliability in unfamiliar environments, and erosion of human oversight -- all of which threaten both military effectiveness and the openness of AI research. These risks cannot be addressed by high-level policy alone; effective regulation must be grounded in the technical behavior of AI models. We argue that AI researchers must be involved throughout the regulatory lifecycle. Thus, we propose a clear, behavior-based definition of AI-LAWS -- systems that introduce unique risks through their use of modern AI -- as a foundation for technically grounded regulation, given that existing frameworks do not distinguish them from conventional LAWS. Using this definition, we propose several technically-informed policy directions and invite greater participation from the AI research community in military AI policy discussions.
comment: 16 pages, 2 tables, 1 figure
ImLPR: Image-based LiDAR Place Recognition using Vision Foundation Models
LiDAR Place Recognition (LPR) is a key component in robotic localization, enabling robots to align current scans with prior maps of their environment. While Visual Place Recognition (VPR) has embraced Vision Foundation Models (VFMs) to enhance descriptor robustness, LPR has relied on task-specific models with limited use of pre-trained foundation-level knowledge. This is due to the lack of 3D foundation models and the challenges of using VFM with LiDAR point clouds. To tackle this, we introduce ImLPR, a novel pipeline that employs a pre-trained DINOv2 VFM to generate rich descriptors for LPR. To our knowledge, ImLPR is the first method to leverage a VFM to support LPR. ImLPR converts raw point clouds into Range Image Views (RIV) to leverage VFM in the LiDAR domain. It employs MultiConv adapters and Patch-InfoNCE loss for effective feature learning. We validate ImLPR using public datasets where it outperforms state-of-the-art (SOTA) methods in intra-session and inter-session LPR with top Recall@1 and F1 scores across various LiDARs. We also demonstrate that RIV outperforms Bird's-Eye-View (BEV) as a representation choice for adapting LiDAR for VFM. We release ImLPR as open source for the robotics community.
comment: Technical report, 22 Pages, 13 Figures and 12 Tables
Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.
comment: 9 pages, 8 figures, 5 tables
CrashAgent: Crash Scenario Generation via Multi-modal Reasoning
Training and evaluating autonomous driving algorithms requires a diverse range of scenarios. However, most available datasets predominantly consist of normal driving behaviors demonstrated by human drivers, resulting in a limited number of safety-critical cases. This imbalance, often referred to as a long-tail distribution, restricts the ability of driving algorithms to learn from crucial scenarios involving risk or failure, scenarios that are essential for humans to develop driving skills efficiently. To generate such scenarios, we utilize Multi-modal Large Language Models to convert crash reports of accidents into a structured scenario format, which can be directly executed within simulations. Specifically, we introduce CrashAgent, a multi-agent framework designed to interpret multi-modal real-world traffic crash reports for the generation of both road layouts and the behaviors of the ego vehicle and surrounding traffic participants. We comprehensively evaluate the generated crash scenarios from multiple perspectives, including the accuracy of layout reconstruction, collision rate, and diversity. The resulting high-quality and large-scale crash dataset will be publicly available to support the development of safe driving algorithms in handling safety-critical situations.
A Coarse to Fine 3D LiDAR Localization with Deep Local Features for Long Term Robot Navigation in Large Environments
The location of a robot is a key aspect in the field of mobile robotics. This problem is particularly complex when the initial pose of the robot is unknown. In order to find a solution, it is necessary to perform a global localization. In this paper, we propose a method that addresses this problem using a coarse-to-fine solution. The coarse localization relies on a probabilistic approach of the Monte Carlo Localization (MCL) method, with the contribution of a robust deep learning model, the MinkUNeXt neural network, to produce a robust description of point clouds of a 3D LiDAR within the observation model. For fine localization, global point cloud registration has been implemented. MinkUNeXt aids this by exploiting the outputs of its intermediate layers to produce deep local features for each point in a scan. These features facilitate precise alignment between the current sensor observation and one of the point clouds on the map. The proposed MCL method incorporating Deep Local Features for fine localization is termed MCL-DLF. Alternatively, a classical ICP method has been implemented for this precise localization aiming at comparison purposes. This method is termed MCL-ICP. In order to validate the performance of MCL-DLF method, it has been tested on publicly available datasets such as the NCLT dataset, which provides seasonal large-scale environments. Additionally, tests have been also performed with own data (UMH) that also includes seasonal variations on large indoor/outdoor scenarios. The results, which were compared with established state-of-the-art methodologies, demonstrate that the MCL-DLF method obtains an accurate estimate of the robot localization in dynamic environments despite changes in environmental conditions. For reproducibility purposes, the code is publicly available at https://github.com/miriammaximo/MCL-DLF.git
Towards Natural Language Communication for Cooperative Autonomous Driving via Self-Play
Past work has demonstrated that autonomous vehicles can drive more safely if they communicate with one another than if they do not. However, their communication has often not been human-understandable. Using natural language as a vehicle-to-vehicle (V2V) communication protocol offers the potential for autonomous vehicles to drive cooperatively not only with each other but also with human drivers. In this work, we propose a suite of traffic tasks in autonomous driving where vehicles in a traffic scenario need to communicate in natural language to facilitate coordination in order to avoid an imminent collision and/or support efficient traffic flow. To this end, this paper introduces a novel method, LLM+Debrief, to learn a message generation and high-level decision-making policy for autonomous vehicles through multi-agent discussion. To evaluate LLM agents for driving, we developed a gym-like simulation environment that contains a range of driving scenarios. Our experimental results demonstrate that LLM+Debrief is more effective at generating meaningful and human-understandable natural language messages to facilitate cooperation and coordination than a zero-shot LLM agent. Our code and demo videos are available at https://talking-vehicles.github.io/.
A Dataset and Benchmarks for Deep Learning-Based Optical Microrobot Pose and Depth Perception
Optical microrobots, manipulated via optical tweezers (OT), have broad applications in biomedicine. However, reliable pose and depth perception remain fundamental challenges due to the transparent or low-contrast nature of the microrobots, as well as the noisy and dynamic conditions of the microscale environments in which they operate. An open dataset is crucial for enabling reproducible research, facilitating benchmarking, and accelerating the development of perception models tailored to microscale challenges. Standardised evaluation enables consistent comparison across algorithms, ensuring objective benchmarking and facilitating reproducible research. Here, we introduce the OpTical MicroRobot dataset (OTMR), the first publicly available dataset designed to support microrobot perception under the optical microscope. OTMR contains 232,881 images spanning 18 microrobot types and 176 distinct poses. We benchmarked the performance of eight deep learning models, including architectures derived via neural architecture search (NAS), on two key tasks: pose classification and depth regression. Results indicated that Vision Transformer (ViT) achieve the highest accuracy in pose classification, while depth regression benefits from deeper architectures. Additionally, increasing the size of the training dataset leads to substantial improvements across both tasks, highlighting OTMR's potential as a foundational resource for robust and generalisable microrobot perception in complex microscale environments.
comment: Accepted by the 2025 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS)
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning ACL 2025
Large multimodal foundation models, particularly in the domains of language and vision, have significantly advanced various tasks, including robotics, autonomous driving, information retrieval, and grounding. However, many of these models perceive objects as indivisible, overlooking the components that constitute them. Understanding these components and their associated affordances provides valuable insights into an object's functionality, which is fundamental for performing a wide range of tasks. In this work, we introduce a novel real-world benchmark, InstructPart, comprising hand-labeled part segmentation annotations and task-oriented instructions to evaluate the performance of current models in understanding and executing part-level tasks within everyday contexts. Through our experiments, we demonstrate that task-oriented part segmentation remains a challenging problem, even for state-of-the-art Vision-Language Models (VLMs). In addition to our benchmark, we introduce a simple baseline that achieves a twofold performance improvement through fine-tuning with our dataset. With our dataset and benchmark, we aim to facilitate research on task-oriented part segmentation and enhance the applicability of VLMs across various domains, including robotics, virtual reality, information retrieval, and other related fields. Project website: https://zifuwan.github.io/InstructPart/.
comment: Accepted by ACL 2025 Main. Project page: https://zifuwan.github.io/InstructPart/
MorphEUS: Morphable Omnidirectional Unmanned System
Omnidirectional aerial vehicles (OMAVs) have opened up a wide range of possibilities for inspection, navigation, and manipulation applications using drones. In this paper, we introduce MorphEUS, a morphable co-axial quadrotor that can control position and orientation independently with high efficiency. It uses a paired servo motor mechanism for each rotor arm, capable of pointing the vectored-thrust in any arbitrary direction. As compared to the \textit{state-of-the-art} OMAVs, we achieve higher and more uniform force/torque reachability with a smaller footprint and minimum thrust cancellations. The overactuated nature of the system also results in resiliency to rotor or servo-motor failures. The capabilities of this quadrotor are particularly well-suited for contact-based infrastructure inspection and close-proximity imaging of complex geometries. In the accompanying control pipeline, we present theoretical results for full controllability, almost-everywhere exponential stability, and thrust-energy optimality. We evaluate our design and controller on high-fidelity simulations showcasing the trajectory-tracking capabilities of the vehicle during various tasks. Supplementary details and experimental videos are available on the project webpage.
Predictability-Based Curiosity-Guided Action Symbol Discovery
Discovering symbolic representations for skills is essential for abstract reasoning and efficient planning in robotics. Previous neuro-symbolic robotic studies mostly focused on discovering perceptual symbolic categories given a pre-defined action repertoire and generating plans with given action symbols. A truly developmental robotic system, on the other hand, should be able to discover all the abstractions required for the planning system with minimal human intervention. In this study, we propose a novel system that is designed to discover symbolic action primitives along with perceptual symbols autonomously. Our system is based on an encoder-decoder structure that takes object and action information as input and predicts the generated effect. To efficiently explore the vast continuous action parameter space, we introduce a Curiosity-Based exploration module that selects the most informative actions -- the ones that maximize the entropy in the predicted effect distribution. The discovered symbolic action primitives are then used to make plans using a symbolic tree search strategy in single- and double-object manipulation tasks. We compare our model with two baselines that use different exploration strategies in different experiments. The results show that our approach can learn a diverse set of symbolic action primitives, which are effective for generating plans in order to achieve given manipulation goals.
comment: Submitted to IEEE ICDL 2025
BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVs
With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing five core sub-skills: semantic perception, spatial perception, motion control, tool utilization, and task planning. Furthermore, we construct a hybrid testing platform that integrates static real-world environments with dynamic virtual scenarios, enabling comprehensive performance assessment of UAV-EAs across varied contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark will be released at https://github.com/lostwolves/BEDI .
LA-RCS: LLM-Agent-Based Robot Control System
LA-RCS (LLM-agent-based robot control system) is a sophisticated robot control system designed to autonomously plan, work, and analyze the external environment based on user requirements by utilizing LLM-Agent. Utilizing a dual-agent framework, LA-RCS generates plans based on user requests, observes the external environment, executes the plans, and modifies the plans as needed to adapt to changes in the external conditions. Additionally, LA-RCS interprets natural language commands by the user and converts them into commands compatible with the robot interface so that the robot can execute tasks and meet user requests properly. During his process, the system autonomously evaluates observation results, provides feedback on the tasks, and executes commands based on real-time environmental monitoring, significantly reducing the need for user intervention in fulfilling requests. We categorized the scenarios that LA-RCS needs to perform into four distinct types and conducted a quantitative assessment of its performance in each scenario. The results showed an average success rate of 90 percent, demonstrating the system capability to fulfill user requests satisfactorily. For more extensive results, readers can visit our project page: https://la-rcs.github.io
Manipulating Elasto-Plastic Objects With 3D Occupancy and Learning-Based Predictive Control
Manipulating elasto-plastic objects remains a significant challenge due to severe self-occlusion, difficulties of representation, and complicated dynamics. This work proposes a novel framework for elasto-plastic object manipulation with a quasi-static assumption for motions, leveraging 3D occupancy to represent such objects, a learned dynamics model trained with 3D occupancy, and a learning-based predictive control algorithm to address these challenges effectively. We build a novel data collection platform to collect full spatial information and propose a pipeline for generating a 3D occupancy dataset. To infer the 3D occupancy during manipulation, an occupancy prediction network is trained with multiple RGB images supervised by the generated dataset. We design a deep neural network empowered by a 3D convolution neural network (CNN) and a graph neural network (GNN) to predict the complex deformation with the inferred 3D occupancy results. A learning-based predictive control algorithm is introduced to plan the robot actions, incorporating a novel shape-based action initialization module specifically designed to improve the planner efficiency. The proposed framework in this paper can successfully shape the elasto-plastic objects into a given goal shape and has been verified in various experiments both in simulation and the real world.
comment: 8 Pages, 13 figures, accepted for publication in IEEE Robotics and Automation Letters (RA-L)
Revisiting Adversarial Perception Attacks and Defense Methods on Autonomous Driving Systems
Autonomous driving systems (ADS) increasingly rely on deep learning-based perception models, which remain vulnerable to adversarial attacks. In this paper, we revisit adversarial attacks and defense methods, focusing on road sign recognition and lead object detection and prediction (e.g., relative distance). Using a Level-2 production ADS, OpenPilot by Comma$.$ai, and the widely adopted YOLO model, we systematically examine the impact of adversarial perturbations and assess defense techniques, including adversarial training, image processing, contrastive learning, and diffusion models. Our experiments highlight both the strengths and limitations of these methods in mitigating complex attacks. Through targeted evaluations of model robustness, we aim to provide deeper insights into the vulnerabilities of ADS perception systems and contribute guidance for developing more resilient defense strategies.
comment: 8 pages, 2 figures, To appear in the 8th Dependable and Secure Machine Learning Workshop (DSML 2025)
ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Using Lightweight Polygon Maps
This paper presents an effective and reliable pose tracking solution, termed ERPoT, for mobile robots operating in large-scale outdoor and challenging indoor environments, underpinned by an innovative prior polygon map. Especially, to overcome the challenge that arises as the map size grows with the expansion of the environment, the novel form of a prior map composed of multiple polygons is proposed. Benefiting from the use of polygons to concisely and accurately depict environmental occupancy, the prior polygon map achieves long-term reliable pose tracking while ensuring a compact form. More importantly, pose tracking is carried out under pure LiDAR mode, and the dense 3D point cloud is transformed into a sparse 2D scan through ground removal and obstacle selection. On this basis, a novel cost function for pose estimation through point-polygon matching is introduced, encompassing two distinct constraint forms: point-to-vertex and point-to-edge. In this study, our primary focus lies on two crucial aspects: lightweight and compact prior map construction, as well as effective and reliable robot pose tracking. Both aspects serve as the foundational pillars for future navigation across diverse mobile platforms equipped with different LiDAR sensors in varied environments. Comparative experiments based on the publicly available datasets and our self-recorded datasets are conducted, and evaluation results show the superior performance of ERPoT on reliability, prior map size, pose estimation error, and runtime over the other six approaches. The corresponding code can be accessed at https://github.com/ghm0819/ERPoT, and the supplementary video is at https://youtu.be/cseml5FrW1Q.
comment: 20 pages, 21 figures
Survival of the fastest -- algorithm-guided evolution of light-powered underwater microrobots
Depending on multiple parameters, soft robots can exhibit different modes of locomotion that are difficult to model numerically. As a result, improving their performance is complex, especially in small-scale systems characterized by low Reynolds numbers, when multiple aero- and hydrodynamical processes influence their movement. In this work, we optimize light-powered millimetre-scale underwater swimmer locomotion by applying experimental results - measured swimming speed - as the fitness function in two evolutionary algorithms: particle swarm optimization and genetic algorithm. As these soft, light-powered robots with different characteristics (phenotypes) can be fabricated quickly, they provide a great platform for optimisation experiments, using many competing robots to improve swimming speed over consecutive generations. Interestingly, just like in natural evolution, unexpected gene combinations led to surprisingly good results, including eight-fold increase in speed or the discovery of a self-oscillating underwater locomotion mode.
comment: 14 pages
Embodied intelligent industrial robotics: Concepts and techniques
In recent years, embodied intelligent robotics (EIR) advances significantly in multi-modal perception, autonomous decision-making, and physical interaction. Some robots have already been tested in general-purpose scenarios, such as homes and shopping malls. However, EIR currently lacks a deep understanding of the semantics of industrial environments and the normative constraints between industrial operating objects. The goal of this paper is to advance the research and application of embodied intelligence in industrial scenarios. This paper first reviews the history of industrial robotics and the mainstream EIR frameworks. Herein, the concept of embodied intelligent industrial robotics (EIIR) is formulated and a knowledge-driven EIIR technical framework for industrial environments is proposed. The framework includes four primary modules: a world model, a high-level task planner, a low-level skill controller, and a simulator. The development of techniques related to each module are also thoroughly discussed, and recent progress regarding their adaption to industrial applications is emphasized. Finally, the key challenges that EIIR encounters in industrial scenarios are summarized and future research directions are suggested. The authors believe that EIIR technology is shaping the next generation of industrial robotics. EIIR-based industrial systems have strong potential to enable intelligent manufacturing. It is expected that this review could serve as a valuable reference for scholars and engineers that are interested in industrial embodied intelligence. Together, scholars can use this research to drive the rapid advancement and application of EIIR techniques. The authors would continue to track and summarize new studies in the project page https://github.com/jackyzengl/EIIR.
comment: 62 pages, 11 figures. The associated project can be found at https://github.com/jackyzengl/EIIR
Koopman Operators in Robot Learning
Koopman operator theory offers a rigorous treatment of dynamics and has been emerging as an alternative modeling and learning-based control method across various robotics sub-domains. Due to its ability to represent nonlinear dynamics as a linear (but higher-dimensional) operator, Koopman theory offers a fresh lens through which to understand and tackle the modeling and control of complex robotic systems. Moreover, it enables incremental updates and is computationally inexpensive, thus making it particularly appealing for real-time applications and online active learning. This review delves deeply into the foundations of Koopman operator theory and systematically builds a bridge from theoretical principles to practical robotic applications. We begin by explaining the mathematical underpinnings of the Koopman framework and discussing approximation approaches for incorporating inputs into Koopman-based modeling. Foundational considerations, such as data collection strategies as well as the design of lifting functions for effective system embedding, are also discussed. We then explore how Koopman-based models serve as a unifying tool for a range of robotics tasks, including model-based control, real-time state estimation, and motion planning. The review proceeds to a survey of cutting-edge research that demonstrates the versatility and growing impact of Koopman methods across diverse robotics sub-domains: from aerial and legged platforms to manipulators, soft-bodied systems, and multi-agent networks. A presentation of more advanced theoretical topics, necessary to push forward the overall framework, is included. Finally, we reflect on some key open challenges that remain and articulate future research directions that will shape the next phase of Koopman-inspired robotics. To support practical adoption, we provide a hands-on tutorial with executable code at https://shorturl.at/ouE59.
comment: Submitted to TRO
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Bimanual Regrasp Planning and Control for Active Reduction of Object Pose Uncertainty
Precisely grasping an object is a challenging task due to pose uncertainties. Conventional methods have used cameras and fixtures to reduce object uncertainty. They are effective but require intensive preparation, such as designing jigs based on the object geometry and calibrating cameras with high-precision tools fabricated using lasers. In this study, we propose a method to reduce the uncertainty of the position and orientation of a grasped object without using a fixture or a camera. Our method is based on the concept that the flat finger pads of a parallel gripper can reduce uncertainty along its opening/closing direction through flat surface contact. Three orthogonal grasps by parallel grippers with flat finger pads collectively constrain an object's position and orientation to a unique state. Guided by the concepts, we develop a regrasp planning and admittance control approach that sequentially finds and leverages three orthogonal grasps of two robotic arms to actively reduce uncertainties in the object pose. We evaluated the proposed method on different initial object uncertainties and verified that it had good repeatability. The deviation levels of the experimental trials were on the same order of magnitude as those of an optical tracking system, demonstrating strong relative inference performance.
How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?
In the field of robotics and automation, navigation systems based on Large Language Models (LLMs) have recently demonstrated impressive performance. However, the security aspects of these systems have received relatively less attention. This paper pioneers the exploration of vulnerabilities in LLM-based navigation models in urban outdoor environments, a critical area given the widespread application of this technology in autonomous driving, logistics, and emergency services. Specifically, we introduce a novel Navigational Prompt Attack that manipulates LLM-based navigation models by perturbing the original navigational prompt, leading to incorrect actions. Based on the method of perturbation, our attacks are divided into two types: Navigational Prompt Insert (NPI) Attack and Navigational Prompt Swap (NPS) Attack. We conducted comprehensive experiments on an LLM-based navigation model that employs various LLMs for reasoning. Our results, derived from the Touchdown and Map2Seq street-view datasets under both few-shot learning and fine-tuning configurations, demonstrate notable performance declines across seven metrics in the face of both white-box and black-box attacks. Moreover, our attacks can be easily extended to other LLM-based navigation models with similarly effective results. These findings highlight the generalizability and transferability of the proposed attack, emphasizing the need for enhanced security in LLM-based navigation systems. As an initial countermeasure, we propose the Navigational Prompt Engineering (NPE) Defense strategy, which concentrates on navigation-relevant keywords to reduce the impact of adversarial attacks. While initial findings indicate that this strategy enhances navigational safety, there remains a critical need for the wider research community to develop stronger defense methods to effectively tackle the real-world challenges faced by these systems.
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
Imitation learning has emerged as a promising approach towards building generalist robots. However, scaling imitation learning for large robot foundation models remains challenging due to its reliance on high-quality expert demonstrations. Meanwhile, large amounts of video data depicting a wide range of environments and diverse behaviors are readily available. This data provides a rich source of information about real-world dynamics and agent-environment interactions. Leveraging this data directly for imitation learning, however, has proven difficult due to the lack of action annotation. In this work, we present Unified World Models (UWM), a framework that allows for leveraging both video and action data for policy learning. Specifically, a UWM integrates an action diffusion process and a video diffusion process within a unified transformer architecture, where independent diffusion timesteps govern each modality. By controlling each diffusion timestep, UWM can flexibly represent a policy, a forward dynamics, an inverse dynamics, and a video generator. Through simulated and real-world experiments, we show that: (1) UWM enables effective pretraining on large-scale multitask robot datasets with both dynamics and action predictions, resulting in more generalizable and robust policies than imitation learning, (2) UWM naturally facilitates learning from action-free video data through independent control of modality-specific diffusion timesteps, further improving the performance of finetuned policies. Our results suggest that UWM offers a promising step toward harnessing large, heterogeneous datasets for scalable robot learning, and provides a simple unification between the often disparate paradigms of imitation learning and world modeling. Videos and code are available at https://weirdlabuw.github.io/uwm/.
VERDI: VLM-Embedded Reasoning for Autonomous Driving
While autonomous driving (AD) stacks struggle with decision making under partial observability and real-world complexity, human drivers are capable of commonsense reasoning to make near-optimal decisions with limited information. Recent work has attempted to leverage finetuned Vision-Language Models (VLMs) for trajectory planning at inference time to emulate human behavior. Despite their success in benchmark evaluations, these methods are often impractical to deploy (a 70B parameter VLM inference at merely 8 tokens per second requires more than 160G of memory), and their monolithic network structure prohibits safety decomposition. To bridge this gap, we propose VLM-Embedded Reasoning for autonomous Driving (VERDI), a training-time framework that distills the reasoning process and commonsense knowledge of VLMs into the AD stack. VERDI augments modular differentiable end-to-end (e2e) AD models by aligning intermediate module outputs at the perception, prediction, and planning stages with text features explaining the driving reasoning process produced by VLMs. By encouraging alignment in latent space, VERDI enables the modular AD stack to internalize structured reasoning, without incurring the inference-time costs of large VLMs. We demonstrate the effectiveness of our method on the NuScenes dataset and find that VERDI outperforms existing e2e methods that do not embed reasoning by 10% in $\ell_{2}$ distance, while maintaining high inference speed.
The One RING: a Robotic Indoor Navigation Generalist
Modern robots vary significantly in shape, size, and sensor configurations used to perceive and interact with their environments. However, most navigation policies are embodiment-specific--a policy trained on one robot typically fails to generalize to another, even with minor changes in body size or camera viewpoint. As custom hardware becomes increasingly common, there is a growing need for a single policy that generalizes across embodiments, eliminating the need to retrain for each specific robot. In this paper, we introduce RING (Robotic Indoor Navigation Generalist), an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator. Trained entirely in simulation, RING leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms. To support this, we augment the AI2-THOR simulator to instantiate robots with controllable configurations, varying in body size, rotation pivot point, and camera parameters. On the visual object-goal navigation task, RING achieves strong cross-embodiment (XE) generalization--72.1% average success rate across five simulated embodiments (a 16.7% absolute improvement on the Chores-S benchmark) and 78.9% across four real-world platforms, including Stretch RE-1, LoCoBot, and Unitree Go1--matching or even surpassing embodiment-specific policies. We further deploy RING on the RB-Y1 wheeled humanoid in a real-world kitchen environment, showcasing its out-of-the-box potential for mobile manipulation platforms. (Project website: https://one-ring-policy.allen.ai)
Agent-Based Simulation of UAV Battery Recharging for IoT Applications: Precision Agriculture, Disaster Recovery, and Dengue Vector Control
The low battery autonomy of Unnamed Aerial Vehicles (UAVs or drones) can make smart farming (precision agriculture), disaster recovery, and the fighting against dengue vector applications difficult. This article considers two approaches, first enumerating the characteristics observed in these three IoT application types and then modeling an UAV's battery recharge coordination using the Agent-Based Simulation (ABS) approach. In this way, we propose that each drone inside the swarm does not communicate concerning this recharge coordination decision, reducing energy usage and permitting remote usage. A total of 6000 simulations were run to evaluate how two proposed policies, the BaseLine (BL) and ChargerThershold (CT) coordination recharging policy, behave in 30 situations regarding how each simulation sets conclude the simulation runs and how much time they work until recharging results. CT policy shows more reliable results in extreme system usage. This work conclusion presents the potential of these three IoT applications to achieve their perpetual service without communication between drones and ground stations. This work can be a baseline for future policies and simulation parameter enhancements.
comment: 22 pages
Stable Object Placement Planning From Contact Point Robustness
We introduce a planner designed to guide robot manipulators in stably placing objects within intricate scenes. Our proposed method reverses the traditional approach to object placement: our planner selects contact points first and then determines a placement pose that solicits the selected points. This is instead of sampling poses, identifying contact points, and evaluating pose quality. Our algorithm facilitates stability-aware object placement planning, imposing no restrictions on object shape, convexity, or mass density homogeneity, while avoiding combinatorial computational complexity. Our proposed stability heuristic enables our planner to find a solution about 20 times faster when compared to the same algorithm not making use of the heuristic and eight times faster than a state-of-the-art method that uses the traditional sample-and-evaluate approach. Our proposed planner is also more successful in finding stable placements than the five other benchmarked algorithms. Derived from first principles and validated in ten real robot experiments, our planner offers a general and scalable method to tackle the problem of object placement planning with rigid objects.
comment: Accepted to IEEE Transactions on Robotics. Contains 15 pages, 13 figures, and 3 tables
Multiagent Systems
Facility Location with Public Locations and Private Doubly-Peaked Costs
In the facility location problem, the task is to place one or more facilities so as to minimize the sum of the agent costs for accessing their nearest facility. Heretofore, in the strategic version, agent locations have been assumed to be private, while their cost measures have been public and identical. For the most part, the cost measure has been the distance to the nearest facility. However, in multiple natural settings, such as placing a firehouse or a school, this modeling does not appear to be a good fit. For it seems natural that the agent locations would be known, but their costs might be private information. In addition, for these types of settings, agents may well want the nearest facility to be at the right distance: near, but not too near. This is captured by the doubly-peaked cost introduced by Filos-Ratsikas et al. (AAMAS 2017). In this paper, we re-examine the facility location problem from this perspective: known agent locations and private preferred distances to the nearest facility. We then give lower and upper bounds on achievable approximations, focusing on the problem in 1D, and in 2D with an $L_1$ distance measure.
Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions
Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions. Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.
comment: In review
Get Experience from Practice: LLM Agents with Record & Replay
AI agents, empowered by Large Language Models (LLMs) and communication protocols such as MCP and A2A, have rapidly evolved from simple chatbots to autonomous entities capable of executing complex, multi-step tasks, demonstrating great potential. However, the LLMs' inherent uncertainty and heavy computational resource requirements pose four significant challenges to the development of safe and efficient agents: reliability, privacy, cost and performance. Existing approaches, like model alignment, workflow constraints and on-device model deployment, can partially alleviate some issues but often with limitations, failing to fundamentally resolve these challenges. This paper proposes a new paradigm called AgentRR (Agent Record & Replay), which introduces the classical record-and-replay mechanism into AI agent frameworks. The core idea is to: 1. Record an agent's interaction trace with its environment and internal decision process during task execution, 2. Summarize this trace into a structured "experience" encapsulating the workflow and constraints, and 3. Replay these experiences in subsequent similar tasks to guide the agent's behavior. We detail a multi-level experience abstraction method and a check function mechanism in AgentRR: the former balances experience specificity and generality, while the latter serves as a trust anchor to ensure completeness and safety during replay. In addition, we explore multiple application modes of AgentRR, including user-recorded task demonstration, large-small model collaboration and privacy-aware agent execution, and envision an experience repository for sharing and reusing knowledge to further reduce deployment cost.
Multi-agent Systems for Misinformation Lifecycle : Detection, Correction And Source Identification
The rapid proliferation of misinformation in digital media demands solutions that go beyond isolated Large Language Model(LLM) or AI Agent based detection methods. This paper introduces a novel multi-agent framework that covers the complete misinformation lifecycle: classification, detection, correction, and source verification to deliver more transparent and reliable outcomes. In contrast to single-agent or monolithic architectures, our approach employs five specialized agents: an Indexer agent for dynamically maintaining trusted repositories, a Classifier agent for labeling misinformation types, an Extractor agent for evidence based retrieval and ranking, a Corrector agent for generating fact-based correction and a Verification agent for validating outputs and tracking source credibility. Each agent can be individually evaluated and optimized, ensuring scalability and adaptability as new types of misinformation and data sources emerge. By decomposing the misinformation lifecycle into specialized agents - our framework enhances scalability, modularity, and explainability. This paper proposes a high-level system overview, agent design with emphasis on transparency, evidence-based outputs, and source provenance to support robust misinformation detection and correction at scale.
An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems
Multi-agent AI systems (MAS) offer a promising framework for distributed intelligence, enabling collaborative reasoning, planning, and decision-making across autonomous agents. This paper provides a systematic outlook on the current opportunities and challenges of MAS, drawing insights from recent advances in large language models (LLMs), federated optimization, and human-AI interaction. We formalize key concepts including agent topology, coordination protocols, and shared objectives, and identify major risks such as dependency, misalignment, and vulnerabilities arising from training data overlap. Through a biologically inspired simulation and comprehensive theoretical framing, we highlight critical pathways for developing robust, scalable, and secure MAS in real-world settings.
Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation
Despite advances in designing personas for Large Language Models (LLM), challenges remain in aligning them with human cognitive processes and representing diverse stakeholder perspectives. We introduce a Social Cognitive Theory (SCT) agent design framework for designing, evaluating, and implementing psychologically grounded LLMs with consistent behavior. Our framework operationalizes SCT through four personal factors (cognitive, motivational, biological, and affective) for designing, six quantifiable constructs for evaluating, and a graph database-backed architecture for implementing stakeholder personas. Experiments tested agents' responses to contradicting information of varying reliability. In the highly polarized renewable energy transition discourse, we design five diverse agents with distinct ideologies, roles, and stakes to examine stakeholder representation. The evaluation of these agents in contradictory scenarios occurs through comprehensive processes that implement the SCT. Results show consistent response patterns ($R^2$ range: $0.58-0.61$) and systematic temporal development of SCT construct effects. Principal component analysis identifies two dimensions explaining $73$% of variance, validating the theoretical structure. Our framework offers improved explainability and reproducibility compared to black-box approaches. This work contributes to ongoing efforts to improve diverse stakeholder representation while maintaining psychological consistency in LLM personas.
Single-agent or Multi-agent Systems? Why Not Both?
Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different large language model (LLM) agents and tools. Prior studies have reported the superior accuracy performance of MAS across diverse domains, enabled by long-horizon context tracking and error correction through role-specific agents. However, the design and deployment of MAS incur higher complexity and runtime cost compared to single-agent systems (SAS). Meanwhile, frontier LLMs, such as OpenAI-o3 and Gemini-2.5-Pro, have rapidly advanced in long-context reasoning, memory retention, and tool usage, mitigating many limitations that originally motivated MAS designs. In this paper, we conduct an extensive empirical study comparing MAS and SAS across various popular agentic applications. We find that the benefits of MAS over SAS diminish as LLM capabilities improve, and we propose efficient mechanisms to pinpoint the error-prone agent in MAS. Furthermore, the performance discrepancy between MAS and SAS motivates our design of a hybrid agentic paradigm, request cascading between MAS and SAS, to improve both efficiency and capability. Our design improves accuracy by 1.1-12% while reducing deployment costs by up to 20% across various agentic applications.
TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification
Recent advances such as Chain-of-Thought prompting have significantly improved large language models (LLMs) in zero-shot medical reasoning. However, prompting-based methods often remain shallow and unstable, while fine-tuned medical LLMs suffer from poor generalization under distribution shifts and limited adaptability to unseen clinical scenarios. To address these limitations, we present TAGS, a test-time framework that combines a broadly capable generalist with a domain-specific specialist to offer complementary perspectives without any model fine-tuning or parameter updates. To support this generalist-specialist reasoning process, we introduce two auxiliary modules: a hierarchical retrieval mechanism that provides multi-scale exemplars by selecting examples based on both semantic and rationale-level similarity, and a reliability scorer that evaluates reasoning consistency to guide final answer aggregation. TAGS achieves strong performance across nine MedQA benchmarks, boosting GPT-4o accuracy by 13.8%, DeepSeek-R1 by 16.8%, and improving a vanilla 7B model from 14.1% to 23.9%. These results surpass several fine-tuned medical LLMs, without any parameter updates. The code will be available at https://github.com/JianghaoWu/TAGS.
comment: 16 pages including references, 2 figures
Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control
Complex tasks are increasingly delegated to ensembles of specialized LLM-based agents that reason, communicate, and coordinate actions-both among themselves and through interactions with external tools, APIs, and databases. While persistent memory has been shown to enhance single-agent performance, most approaches assume a monolithic, single-user context-overlooking the benefits and challenges of knowledge transfer across users under dynamic, asymmetric permissions. We introduce Collaborative Memory, a framework for multi-user, multi-agent environments with asymmetric, time-evolving access controls encoded as bipartite graphs linking users, agents, and resources. Our system maintains two memory tiers: (1) private memory-private fragments visible only to their originating user; and (2) shared memory-selectively shared fragments. Each fragment carries immutable provenance attributes (contributing agents, accessed resources, and timestamps) to support retrospective permission checks. Granular read policies enforce current user-agent-resource constraints and project existing memory fragments into filtered transformed views. Write policies determine fragment retention and sharing, applying context-aware transformations to update the memory. Both policies may be designed conditioned on system, agent, and user-level information. Our framework enables safe, efficient, and interpretable cross-user knowledge sharing, with provable adherence to asymmetric, time-varying policies and full auditability of memory operations.
Implementing Agents in JavaScript
This chapter gives an introduction to agent-oriented programming in JavaScript. It provides an example-based walk-through of how to implement abstractions for reasoning loop agents in vanilla JavaScript. The initial example is used as a stepping stone for explaining how to implement slightly more advanced agents and multi-agent systems using JS-son, a JavaScript library for agent-oriented programming. In this context, the chapter also explains how to integrate reasoning loop agents with generative AI technologies--specifically, large language models. Finally, application scenarios in several technology ecosystems and future research directions are sketched.
comment: This chapter will eventually by published in the book "Agents and Multi-Agent Systems Development -- Platforms, Toolkits, Technologies", edited by Collier, Mascardi, and Ricci
Uncovering Bottlenecks and Optimizing Scientific Lab Workflows with Cycle Time Reduction Agents
Scientific laboratories, particularly those in pharmaceutical and biotechnology companies, encounter significant challenges in optimizing workflows due to the complexity and volume of tasks such as compound screening and assay execution. We introduce Cycle Time Reduction Agents (CTRA), a LangGraph-based agentic workflow designed to automate the analysis of lab operational metrics. CTRA comprises three main components: the Question Creation Agent for initiating analysis, Operational Metrics Agents for data extraction and validation, and Insights Agents for reporting and visualization, identifying bottlenecks in lab processes. This paper details CTRA's architecture, evaluates its performance on a lab dataset, and discusses its potential to accelerate pharmaceutical and biotechnological development. CTRA offers a scalable framework for reducing cycle times in scientific labs.
Towards Natural Language Communication for Cooperative Autonomous Driving via Self-Play
Past work has demonstrated that autonomous vehicles can drive more safely if they communicate with one another than if they do not. However, their communication has often not been human-understandable. Using natural language as a vehicle-to-vehicle (V2V) communication protocol offers the potential for autonomous vehicles to drive cooperatively not only with each other but also with human drivers. In this work, we propose a suite of traffic tasks in autonomous driving where vehicles in a traffic scenario need to communicate in natural language to facilitate coordination in order to avoid an imminent collision and/or support efficient traffic flow. To this end, this paper introduces a novel method, LLM+Debrief, to learn a message generation and high-level decision-making policy for autonomous vehicles through multi-agent discussion. To evaluate LLM agents for driving, we developed a gym-like simulation environment that contains a range of driving scenarios. Our experimental results demonstrate that LLM+Debrief is more effective at generating meaningful and human-understandable natural language messages to facilitate cooperation and coordination than a zero-shot LLM agent. Our code and demo videos are available at https://talking-vehicles.github.io/.
The Hamiltonian of Poly-matrix Zero-sum Games
Understanding a dynamical system fundamentally relies on establishing an appropriate Hamiltonian function and elucidating its symmetries. By formulating agents' strategies and cumulative payoffs as canonically conjugate variables, we identify the Hamiltonian function that generates the dynamics of poly-matrix zero-sum games. We reveal the symmetries of our Hamiltonian and derive the associated conserved quantities, showing how the conservation of probability and the invariance of the Fenchel coupling are intrinsically encoded within the system. Furthermore, we propose the dissipation FTRL (DFTRL) dynamics by introducing a perturbation that dissipates the Fenchel coupling, proving convergence to the Nash equilibrium and linking DFTRL to last-iterate convergent algorithms. Our results highlight the potential of Hamiltonian dynamics in uncovering the structural properties of learning dynamics in games, and pave the way for broader applications of Hamiltonian dynamics in game theory and machine learning.
comment: 26 pages, 4 figures
HYGMA: Hypergraph Coordination Networks with Dynamic Grouping for Multi-Agent Reinforcement Learning
Cooperative multi-agent reinforcement learning faces significant challenges in effectively organizing agent relationships and facilitating information exchange, particularly when agents need to adapt their coordination patterns dynamically. This paper presents a novel framework that integrates dynamic spectral clustering with hypergraph neural networks to enable adaptive group formation and efficient information processing in multi-agent systems. The proposed framework dynamically constructs and updates hypergraph structures through spectral clustering on agents' state histories, enabling higher-order relationships to emerge naturally from agent interactions. The hypergraph structure is enhanced with attention mechanisms for selective information processing, providing an expressive and efficient way to model complex agent relationships. This architecture can be implemented in both value-based and policy-based paradigms through a unified objective combining task performance with structural regularization. Extensive experiments on challenging cooperative tasks demonstrate that our method significantly outperforms state-of-the-art approaches in both sample efficiency and final performance.
The Cognitive Foundations of Economic Exchange: A Modular Framework Grounded in Behavioral Evidence
A key challenge in multi-agent AI is modeling social cooperation under realistic behavioral constraints. Many foundational concepts in economics and ethics such as "trust" or "morality" are often defined informally, without operational criteria or cognitive grounding, which limits their testability and implementation in artificial agents. Drawing on converging empirical evidence from primate behavior, infant cognition, and economic anthropology, we propose a conceptual framework composed of three cognitively minimal mechanisms: individual recognition, reciprocal credence, and cost return sensitivity. This framework reframes trust as a graded cognitive expectation, providing a simulateable basis for reciprocal exchange in artificial agents, and enabling the bottom-up emergence of scalable cooperation and institutional dynamics.
comment: This version presents the full position paper, with a complete theoretical framework and implementation pathway
EFX Exists for Three Types of Agents
We study the problem of finding an envy-free allocation of indivisible goods among agents with additive valuations. We focus on the fairness notion of envy-freeness up to any good (EFX). A central open question in fair division is whether EFX allocations always exist for any number of agents. While EFX has been established for three agents [CGM24] and for any number of agents with at most two distinct valuations [Mah23], its existence in more general settings remains open. In this paper, we make significant progress by proving that EFX allocations exist for any number of agents when there are at most three distinct additive valuations. This result simultaneously generalizes both the three-agent case and the two-type case, settling an open question in the field (see [Mah23]).
comment: This paper has been accepted for publication in the Twenty-Sixth ACM Conference on Economics and Computation (EC 2025)
Scalable Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantee: A Constrained Mean-Field Reinforcement Learning Approach
The rapid expansion of ride-sourcing services such as Uber, Lyft, and Didi Chuxing has fundamentally reshaped urban transportation by offering flexible, on-demand mobility via mobile applications. Despite their convenience, these platforms confront significant operational challenges, particularly vehicle rebalancing - the strategic repositioning of a large group of vehicles to address spatiotemporal mismatches in supply and demand. Inadequate rebalancing not only results in prolonged rider waiting times and inefficient vehicle utilization but also leads to fairness issues, such as the inequitable distribution of service quality and disparities in driver income. To tackle these complexities, we introduce continuous-state mean-field control (MFC) and mean-field reinforcement learning (MFRL) models that employ continuous vehicle repositioning actions. MFC and MFRL offer scalable solutions by modeling each vehicle's behavior through interaction with the vehicle distribution, rather than with individual vehicles. This limits the issues arising from the curse of dimensionality inherent in traditional multi-agent methods, enabling coordination across large fleets with significantly reduced computational complexity. To ensure equitable service access across geographic regions, we integrate an accessibility constraint into both models. Extensive empirical evaluation using real-world data-driven simulation of Shenzhen demonstrates the real-time efficiency and robustness of our approach. Remarkably, it scales to tens of thousands of vehicles, with training times comparable to the decision time of a single linear programming rebalancing. Besides, policies generated by our approach effectively explore the efficiency-equity Pareto front, outperforming conventional benchmarks across key metrics like fleet utilization, fulfilled requests, and pickup distance, while ensuring equitable service access.
comment: 32 pages, 12 figures
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems ICML
Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extensive failure logs from 127 LLM multi-agent systems with fine-grained annotations linking failures to specific agents and decisive error steps. Using the Who&When, we develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons. The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps, with some methods performing below random. Even SOTA reasoning models, such as OpenAI o1 and DeepSeek R1, fail to achieve practical usability. These results highlight the task's complexity and the need for further research in this area. Code and dataset are available at https://github.com/mingyin1/Agents_Failure_Attribution
comment: revise affiliation. indicate ICML processed
Agent-Based Simulation of UAV Battery Recharging for IoT Applications: Precision Agriculture, Disaster Recovery, and Dengue Vector Control
The low battery autonomy of Unnamed Aerial Vehicles (UAVs or drones) can make smart farming (precision agriculture), disaster recovery, and the fighting against dengue vector applications difficult. This article considers two approaches, first enumerating the characteristics observed in these three IoT application types and then modeling an UAV's battery recharge coordination using the Agent-Based Simulation (ABS) approach. In this way, we propose that each drone inside the swarm does not communicate concerning this recharge coordination decision, reducing energy usage and permitting remote usage. A total of 6000 simulations were run to evaluate how two proposed policies, the BaseLine (BL) and ChargerThershold (CT) coordination recharging policy, behave in 30 situations regarding how each simulation sets conclude the simulation runs and how much time they work until recharging results. CT policy shows more reliable results in extreme system usage. This work conclusion presents the potential of these three IoT applications to achieve their perpetual service without communication between drones and ground stations. This work can be a baseline for future policies and simulation parameter enhancements.
comment: 22 pages
Systems and Control (CS)
Analysis on Energy Efficiency of RIS-Assisted Multiuser Downlink Near-Field Communications
In this paper, we focus on the energy efficiency (EE) optimization and analysis of reconfigurable intelligent surface (RIS)-assisted multiuser downlink near-field communications. Specifically, we conduct a comprehensive study on several key factors affecting EE performance, including the number of RIS elements, the types of reconfigurable elements, reconfiguration resolutions, and the maximum transmit power. To accurately capture the power characteristics of RISs, we adopt more practical power consumption models for three commonly used reconfigurable elements in RISs: PIN diodes, varactor diodes, and radio frequency (RF) switches. These different elements may result in RIS systems exhibiting significantly different energy efficiencies (EEs), even when their spectral efficiencies (SEs) are similar. Considering discrete phases implemented at most RISs in practice, which makes their optimization NP-hard, we develop a nested alternating optimization framework to maximize EE, consisting of an outer integer-based optimization for discrete RIS phase reconfigurations and a nested non-convex optimization for continuous transmit power allocation within each iteration. Extensive comparisons with multiple benchmark schemes validate the effectiveness and efficiency of the proposed framework. Furthermore, based on the proposed optimization method, we analyze the EE performance of RISs across different key factors and identify the optimal RIS architecture yielding the highest EE.
comment: 14 pages, 15 figures. This work has been accepted for publication in IEEE Transactions on Communications
Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons
Many preference elicitation algorithms consider preference over propositional logic formulas or items with different attributes. In sequential decision making, a user's preference can be a preorder over possible outcomes, each of which is a temporal sequence of events. This paper considers a class of preference inference problems where the user's unknown preference is represented by a preorder over regular languages (sets of temporal sequences), referred to as temporal goals. Given a finite set of pairwise comparisons between finite words, the objective is to learn both the set of temporal goals and the preorder over these goals. We first show that a preference relation over temporal goals can be modeled by a Preference Deterministic Finite Automaton (PDFA), which is a deterministic finite automaton augmented with a preorder over acceptance conditions. The problem of preference inference reduces to learning the PDFA. This problem is shown to be computationally challenging, with the problem of determining whether there exists a PDFA of size smaller than a given integer $k$, consistent with the sample, being NP-Complete. We formalize the properties of characteristic samples and develop an algorithm that guarantees to learn, given a characteristic sample, the minimal PDFA equivalent to the true PDFA from which the sample is drawn. We present the method through a running example and provide detailed analysis using a robotic motion planning problem.
comment: 16 pages, 11 figures, technical report, submission under review
Transient Slack Capability
This paper introduces the concept of Transient Slack Capability (TSC), a set of three necessary device-level conditions to ensure stability under sustained power perturbations. TSC states that a device must (1) possess sufficient stored energy; (2) a controlled input power; and (3) maintain internal energy balance and synchronization. The paper shows that the relation among the time-scales of storage, control, and power perturbation is at the core of the TSC concept. Using the port-Hamiltonian (PH) framework, these conditions are formalized and validated via simulations on an adapted model of the WSCC 9-bus system. Case studies demonstrate that TSC is achievable in both Grid-Following (GFL) and Grid-Forming (GFM) converter control schemes, provided the conditions above are satisfied. Sensitivity analysis serves to identify storage and power reserve requirements to meet Conditions 1 and 2; the impact of converter current limiters on Condition 3; and inertia-less solutions able to achieve TSC.
Preliminary Characterization of Bio-inspired Dog-Nose Sampler for Aerosol Detection
Before aerosols can be sensed, sampling technologies must capture the particulate matter of interest. To that end, for systems deployed in open environments where the location of the aerosol is unknown, extending the reach of the sampler could lessen the precision required in sensor placement or reduce the number of sensors required for full spatial coverage. Inspired by the sensitivity of the canine olfactory system, this paper presents a rudimentary sampler that mimics the air flow of a dog's nose. The design consists of speed-controlled inhalation jets, as well as exhalation jets that are angled down and to the side. We tested this design on volatile organic compounds (VOC) in a small number of scenarios to validate the concept and understand how the system behaves. We show that in preliminary testing this dog-nose setup provides improvements over passive and solely inhalation sensing.
comment: 5 pages
Selection Mechanisms for Sequence Modeling using Linear State Space Models
Recent advancements in language modeling tasks have been driven by architectures such as Transformers and, more recently, by Selective State Space Models (SSMs). In this paper, we introduce an alternative selection mechanism inspired by control theory methodologies. Specifically, we propose a novel residual generator for selection, drawing an analogy to fault detection strategies in Linear Time-Invariant (LTI) systems. Unlike Mamba, which utilizes Linear Time-Varying (LTV) systems, our approach combines multiple LTI systems, preserving their beneficial properties during training while achieving comparable selectivity. To evaluate the effectiveness of the proposed architecture, we test its performance on synthetic tasks. While these tasks are not inherently critical, they serve as benchmarks to test the selectivity properties of different cores architecture. This work highlights the potential of integrating theoretical insights with experimental advancements, offering a complementary perspective to deep learning innovations at the intersection of control theory and machine learning.
comment: 9 pages, 5 figures
Sufficient Conditions for Detectability of Approximately Discretized Nonlinear Systems
In many sampled-data applications, observers are designed based on approximately discretized models of continuous-time systems, where usually only the discretized system is analyzed in terms of its detectability. In this paper, we show that if the continuous-time system satisfies certain linear matrix inequality (LMI) conditions, and the sampling period of the discretization scheme is sufficiently small, then the whole family of discretized systems (parameterized by the sampling period) satisfies analogous discrete-time LMI conditions that imply detectability. Our results are applicable to general discretization schemes, as long as they produce approximate models whose linearizations are in some sense consistent with the linearizations of the continuous-time ones. We explicitly show that the Euler and second-order Runge-Kutta methods satisfy this condition. A batch-reactor system example is provided to highlight the usefulness of our results from a practical perspective.
comment: 7 pages, submitted to 23rd European Control Conference
TransDF: Time-Series Forecasting Needs Transformed Label Alignment
Training time-series forecasting models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimization. To address these challenges, we propose Transform-enhanced Direct Forecast (TransDF), which transforms the label sequence into decorrelated components with discriminated significance. Models are trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount. Extensive experiments demonstrate that TransDF achieves state-of-the-art performance and is compatible with various forecasting models. Code is available at https://anonymous.4open.science/r/TransDF-88CF.
Periodic Regulation of Linear Time-Delay Systems via Youla-Kučera Parametrization SC
The paper proposes an alternative way to achieve the Internal Model Principle (IMP) in contrast to the standard way, where a model of the signal one wishes to track/reject is directly substituted into the closed-loop. The proposed alternative approach relies on an already-existing stabilizing controller, which can be further augmented with a Youla-Ku\v{c}era parameter to let it implicitly admit a model of a signal without altering its stabilizing feature. Thus, with the proposed method, the standard design steps of realizing IMP are reversed. The validity and potential of the proposed approach are demonstrated by considering three different types of time-delay systems. It is shown through simulations that all considered unstable systems, despite the infinite-dimensional closed-loop, can be straightforwardly periodically regulated by augmenting their stabilizing PI controllers. Thanks to a specifically chosen structure for the Youla-Ku\v{c}era parameter, the required tuning can be done by solving a set of linear equations.
comment: 6 pages, 8 figures, submitted February 10, 2025, to Joint SSSC-TDS-COSY
HRSim: An agent-based simulation platform for high-capacity ride-sharing services
The rapid growth of ride-sharing services presents a promising solution to urban transportation challenges, such as congestion and carbon emissions. However, developing efficient operational strategies, such as pricing, matching, and fleet management, requires robust simulation tools that can replicate real-world dynamics at scale. Existing platforms often lack the capacity, flexibility, or open-source accessibility needed to support large-scale, high-capacity ride-sharing services. To address these gaps, we introduce HRSim, an open-source, agent-based High-capacity Ride-sharing Simulator. HRSim integrates real-world road networks and demand data to simulate dynamic ride-sharing operations, including pricing, routing, matching, and repositioning. Its module design supports both ride-sharing and solo-hailing service modes. Also, it includes a visualization module for real-time performance analysis. In addition, HRSim incorporates integer linear programming and heuristic algorithms, which can achieve large-scale simulations of high-capacity ride-sharing services. Applications demonstrate HRSim's utility in various perspectives, including quantifying carbon emissions, scaling ride-sharing performance, evaluating new strategies, etc. By bridging the gap between theoretical research and practical implementation, HRSim serves as a versatile testbed for policymakers and transportation network companies to optimize ride-sharing systems for efficiency and sustainability.
Risk-based approach to the Optimal Transmission Switching problem
This paper deals with the secure Optimal Transmission Switching (OTS) problem in situations where the TSO is forced to accept the risk that some contingencies may result in the de-energization of parts of the grid to avoid the violation of operational limits. This operational policy, which mainly applies to subtransmission systems, is first discussed. Then, a model of that policy is proposed that complements the classical MILP model of the N-1 secure OTS problem. It comprises a connectivity and notably a partial grid loss analysis for branch outage contingencies. Finally, its application to the IEEE 14-bus system is presented. Solutions similar to those observed in operation are reached by the algorithm, notably revealing the preventive-openings-cascade phenomenon.
comment: 2025 IEEE PowerTech Kiel
Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory
This research proposes methods for formulating and guaranteeing the resilience of long short-term memory (LSTM) networks, which can serve as a key technology in AI system quality assurance. We introduce a novel methodology applying incremental input-to-state stability ($\delta$ISS) to mathematically define and evaluate the resilience of LSTM against input perturbations. Key achievements include the development of a data-independent evaluation method and the demonstration of resilience control through adjustments to training parameters. This research presents concrete solutions to AI quality assurance from a control theory perspective, which can advance AI applications in control systems.
comment: 8 pages, 6 figures. Appendix: 16 pages. First three listed authors have equal contributions
ExARNN: An Environment-Driven Adaptive RNN for Learning Non-Stationary Power Dynamics
Non-stationary power system dynamics, influenced by renewable energy variability, evolving demand patterns, and climate change, are becoming increasingly complex. Accurately capturing these dynamics requires a model capable of adapting to environmental factors. Traditional models, including Recurrent Neural Networks (RNNs), lack efficient mechanisms to encode external factors, such as time or environmental data, for dynamic adaptation. To address this, we propose the External Adaptive RNN (ExARNN), a novel framework that integrates external data (e.g., weather, time) to continuously adjust the parameters of a base RNN. ExARNN achieves this through a hierarchical hypernetwork design, using Neural Controlled Differential Equations (NCDE) to process external data and generate RNN parameters adaptively. This approach enables ExARNN to handle inconsistent timestamps between power and external measurements, ensuring continuous adaptation. Extensive forecasting tests demonstrate ExARNN's superiority over established baseline models.
comment: 5 pages, 3 figures, conference
Autonomous Circular Drift Control for 4WD-4WS Vehicles Without Precomputed Drifting Equilibrium
Under extreme conditions, autonomous drifting enables vehicles to follow predefined paths at large slip angles, significantly enhancing the control system's capability to handle hazardous scenarios. Four-wheel-drive and four-wheel-steering (4WD-4WS) vehicles, which have been extensively studied, offer superior path-following precision and enhanced maneuverability under challenging driving conditions. In this paper, a hierarchical drifting controller is proposed for 4WD-4WS vehicles to track both path and velocity without relying on precomputed drifting equilibrium. The controller is structured into two layers: a trajectory tracking layer and an actuator regulation layer. The first layer generates the desired tire forces in the vehicle body frame, while the second layer converts these desired tire forces into steering angle commands and torque commands for the front and rear motors. The effectiveness and robustness of the proposed controller are validated through simulation.
A Dynamic Phasor Framework for Analysis of Grid-Forming Converter Connected to Series-Compensated Line
A dynamic phasor (DP) framework for time-domain and frequency-domain analyses of grid-forming converters (GFCs) connected to series-compensated transmission lines is proposed. The proposed framework can capture the behavior of GFCs subjected to unbalanced short circuit faults in presence of different current limiting strategies. Moreover, the linearizability and time invariance of this framework allows us to perform eigen decomposition, which is a powerful tool for root-cause analysis and control design. We show that a certain degree of series compensation may result in poorly-damped oscillations in presence of the grid-forming converter. A participation factor analysis using the DP model reveals that the point of interconnection voltage angle is dominant in this mode. Eigenvalue sensitivity analysis of controller parameters shows that reducing the power-frequency droop coefficient is most effective in stabilizing the poorly-damped mode. Detailed validation with electromagnetic transient model demonstrates the accuracy of the proposed framework.
State of health prediction of lithium-ion batteries for driving conditions based on full parameter domain sparrow search algorithm and dual-module bidirectional gated recurrent unit
Aiming at the state of health (SOH) prediction of lithium-ion batteries (LiBs) for electric vehicles (EVs), this paper proposes a fusion model of a dual-module bidirectional gated recurrent unit (BiGRU) and sparrow search algorithm (SSA) with full parameter domain optimization. With the help of Spearman correlation analysis and ablation experiments, the indirect health indicator (HI) that can characterize the battery degradation is extracted first based on the incremental capacity (IC) curves of the Oxford battery dataset, which simulates the driving conditions. On this basis, the filtered one-dimensional HI is inputted into the dual-module BiGRU for learning the pre- and post-textual information of the input sequence and extracting the sequence features. In order to combine the different hyperparameters in the dual-module BiGRU, SSA is used to optimize the hyperparameters in the full parameter domain. The proposed SSA-BiGRU model combines the advantages and structures of SSA and BiGRU to achieve the highly accurate SOH prediction of LiBs. Studies based on the Oxford battery dataset have shown that the SSA-BiGRU model has higher accuracy, better robustness and generalization ability. Moreover, the proposed SSA-BiGRU model is tested on a real road-driven EV charging dataset and accurate SOH prediction are obtained.
Sampled-data Systems: Stability, Contractivity and Single-iteration Suboptimal MPC
This paper analyzes the stability of interconnected continuous-time (CT) and discrete-time (DT) systems coupled through sampling and zero-order hold mechanisms. The DT system updates its output at regular intervals $T>0$ by applying an $n$-fold composition of a given map. This setup is motivated by online and sampled-data implementations of optimization-based controllers - particularly model predictive control (MPC) - where the DT system models $n$ iterations of an algorithm approximating the solution of an optimization problem. We introduce the concept of a reduced model, defined as the limiting behavior of the sampled-data system as $T \to 0^+$ and $n \to +\infty$. Our main theoretical contribution establishes that when the reduced model is contractive, there exists a threshold duration $T(n)$ for each iteration count $n$ such that the CT-DT interconnection achieves exponential stability for all sampling periods $T < T(n)$. Finally, under the stronger condition that both the CT and DT systems are contractive, we show exponential stability of their interconnection using a small-gain argument. Our theoretical results provide new insights into suboptimal MPC stability, showing that convergence guarantees hold even when using a single iteration of the optimization algorithm - a practically significant finding for real-time control applications.
MorphEUS: Morphable Omnidirectional Unmanned System
Omnidirectional aerial vehicles (OMAVs) have opened up a wide range of possibilities for inspection, navigation, and manipulation applications using drones. In this paper, we introduce MorphEUS, a morphable co-axial quadrotor that can control position and orientation independently with high efficiency. It uses a paired servo motor mechanism for each rotor arm, capable of pointing the vectored-thrust in any arbitrary direction. As compared to the \textit{state-of-the-art} OMAVs, we achieve higher and more uniform force/torque reachability with a smaller footprint and minimum thrust cancellations. The overactuated nature of the system also results in resiliency to rotor or servo-motor failures. The capabilities of this quadrotor are particularly well-suited for contact-based infrastructure inspection and close-proximity imaging of complex geometries. In the accompanying control pipeline, we present theoretical results for full controllability, almost-everywhere exponential stability, and thrust-energy optimality. We evaluate our design and controller on high-fidelity simulations showcasing the trajectory-tracking capabilities of the vehicle during various tasks. Supplementary details and experimental videos are available on the project webpage.
ZV-Sim: Probabilistic Simulation Framework for Pre-emergent Novel Zoonose Tracking
ZV-Sim is an open-source, modular Python framework for probabilistic simulation and analysis of pre-emergent novel zoonotic diseases using pervasive sensing data. It incorporates customizable Human and Animal Presence agents that leverage known and simulated location data, contact networks, and illness reports to assess and predict disease origins and spread. The framework supports Monte Carlo experiments to analyze outcomes with various user-defined movement and probability models. Although initial models are basic and illustrative, ZV-Sim's extensible design facilitates the integration of more sophisticated models as richer data become available, enhancing future capabilities in zoonotic disease tracking. The source code is publicly available \href{https://github.com/jmaff/zv-sim}{\underline{\textit{here}}}.
comment: 5 pages
Gaussian Processes in Power Systems: Techniques, Applications, and Future Works
The increasing integration of renewable energy sources (RESs) and distributed energy resources (DERs) has significantly heightened operational complexity and uncertainty in modern power systems. Concurrently, the widespread deployment of smart meters, phasor measurement units (PMUs) and other sensors has generated vast spatiotemporal data streams, enabling advanced data-driven analytics and decision-making in grid operations. In this context, Gaussian processes (GPs) have emerged as a powerful probabilistic framework, offering uncertainty quantification, non-parametric modeling, and predictive capabilities to enhance power system analysis and control. This paper presents a comprehensive review of GP techniques and their applications in power system operation and control. GP applications are reviewed across three key domains: GP-based modeling, risk assessment, and optimization and control. These areas serve as representative examples of how GP can be utilized in power systems. Furthermore, critical challenges in GP applications are discussed, and potential research directions are outlined to facilitate future power system operations.
Conformal Robust Control of Linear Systems
End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Current approaches of specifying such robust control subproblems, however, rely on hand specification of perturbations anticipated to be present upon deployment or margin methods that ignore problem structure, resulting in a lack of theoretical guarantees and overly conservative empirical performance. We, instead, propose a novel methodology for LQR systems that leverages conformal prediction to specify such uncertainty regions in a data-driven fashion. Such regions have distribution-free coverage guarantees on the true system dynamics, in turn allowing for a probabilistic characterization of the regret of the resulting robust controller. We then demonstrate that such a controller can be efficiently produced via a novel policy gradient method that has convergence guarantees. We finally demonstrate the superior empirical performance of our method over alternate robust control specifications, such as $H_{\infty}$ and LQR with multiplicative noise, across a collection of engineering control systems.
An Operator-Theoretic Framework to Simulate Neuromorphic Circuits
Splitting algorithms are well-established in convex optimization and are designed to solve large-scale problems. Using such algorithms to simulate the behavior of nonlinear circuit networks provides scalable methods for the simulation and design of neuromorphic systems. For circuits made of linear capacitors and inductors with nonlinear resistive elements, we propose a splitting that breaks the network into its LTI lossless component and its static resistive component. This splitting has both physical and algorithmic advantages and allows for separate calculations in the time domain and in the frequency domain. To demonstrate the scalability of this approach, a network made from one hundred neurons modeled by the well-known FitzHugh-Nagumo circuit with all-to-all diffusive coupling is simulated.
comment: Accepted to CDC 2024
Adaptive Input Design for Nonlinear System Identification with Operational Constraints
We consider the problem of joint input design and parameter estimation for identifying nonlinear system models through the sequential acquisition of measurements while adhering to system constraints. We utilize a receding horizon approach and propose a new scale-invariant input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential parameter estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.
Robust Nash equilibrium seeking based on semi-Markov switching topologies
This paper investigates a distributed robust Nash Equilibrium (NE) seeking problem in fluctuating environments. Specifically, the players, subject to the second-order dynamics, are considered to be influenced by external disturbances and uncertain dynamics while communicating via semi-Markov switching topologies. In such constantly changing network circumstances, the existence of disturbances and uncertain dynamics may directly affect the performance of most existing NE seeking algorithms. Moreover, the semi-Markov switching topologies may cause communication uncertainty, which are considered in NE seeking for the first time. To accommodate the above concerns, the following targets require to be reached simultaneously: (1) Disturbances and uncertain dynamics rejection in finite time; (2) Distributed estimation on unknown information required for players' cost functions; (3) A reasonable estimation consensus protocol under semi-Markov switching; (4) NE seeking for the second-order players. By combining supertwisting-based Integral Sliding-Mode Control (ISMC) with average consensus tracking, a novel robust NE seeking algorithm is constructed, incorporating an effective leader-follower consensus protocol. Furthermore, to lessen dispensable information transmission, a sampled-data-based event-triggered mechanism is introduced. Incorporating the advantages of both semi-Markov switching and event-triggered mechanism, another NE seeking algorithm is proposed. Through designing an appropriate Lyapunov-Krasovskii functional, it is shown that the leader-follower consensus can be achieved in the mean-square sense under event-triggered mechanism. Finally, a connectivity control game is formulated to illustrate the validity of the designed algorithms.
comment: This work has been submitted to the lEEE for possible publication
Toward Near-Globally Optimal Nonlinear Model Predictive Control via Diffusion Models
Achieving global optimality in nonlinear model predictive control (NMPC) is challenging due to the non-convex nature of the underlying optimization problem. Since commonly employed local optimization techniques depend on carefully chosen initial guesses, this non-convexity often leads to suboptimal performance resulting from local optima. To overcome this limitation, we propose a novel diffusion model-based approach for near-globally optimal NMPC consisting of an offline and an online phase. The offline phase employs a local optimizer to sample from the distribution of optimal NMPC control sequences along generated system trajectories through random initial guesses. Subsequently, the generated diverse dataset is used to train a diffusion model to reflect the multi-modal distribution of optima. In the online phase, the trained model is leveraged to efficiently perform a variant of random shooting optimization to obtain near-globally optimal control sequences without relying on any initial guesses or online NMPC solving. The effectiveness of our approach is illustrated in a numerical simulation indicating high performance benefits compared to direct neural network approximations of NMPC and significantly lower computation times than online solving NMPC using global optimizers.
comment: This paper has been accepted by the 2025 7th Annual Learning for Dynamics & Control Conference (L4DC) as an oral presentation and has been nominated for the best paper award
The Barzilai-Borwein Method for Distributed Optimization over Unbalanced Directed Networks
This paper studies optimization problems over multi-agent systems, in which all agents cooperatively minimize a global objective function expressed as a sum of local cost functions. Each agent in the systems uses only local computation and communication in the overall process without leaking their private information. Based on the Barzilai-Borwein (BB) method and multi-consensus inner loops, a distributed algorithm with the availability of larger stepsizes and accelerated convergence, namely ADBB, is proposed. Moreover, owing to employing only row-stochastic weight matrices, ADBB can resolve the optimization problems over unbalanced directed networks without requiring the knowledge of neighbors' out-degree for each agent. Via establishing contraction relationships between the consensus error, the optimality gap, and the gradient tracking error, ADBB is theoretically proved to converge linearly to the globally optimal solution. A real-world data set is used in simulations to validate the correctness of the theoretical analysis.
comment: 33 pages, 8 figures
Koopman Operators in Robot Learning
Koopman operator theory offers a rigorous treatment of dynamics and has been emerging as an alternative modeling and learning-based control method across various robotics sub-domains. Due to its ability to represent nonlinear dynamics as a linear (but higher-dimensional) operator, Koopman theory offers a fresh lens through which to understand and tackle the modeling and control of complex robotic systems. Moreover, it enables incremental updates and is computationally inexpensive, thus making it particularly appealing for real-time applications and online active learning. This review delves deeply into the foundations of Koopman operator theory and systematically builds a bridge from theoretical principles to practical robotic applications. We begin by explaining the mathematical underpinnings of the Koopman framework and discussing approximation approaches for incorporating inputs into Koopman-based modeling. Foundational considerations, such as data collection strategies as well as the design of lifting functions for effective system embedding, are also discussed. We then explore how Koopman-based models serve as a unifying tool for a range of robotics tasks, including model-based control, real-time state estimation, and motion planning. The review proceeds to a survey of cutting-edge research that demonstrates the versatility and growing impact of Koopman methods across diverse robotics sub-domains: from aerial and legged platforms to manipulators, soft-bodied systems, and multi-agent networks. A presentation of more advanced theoretical topics, necessary to push forward the overall framework, is included. Finally, we reflect on some key open challenges that remain and articulate future research directions that will shape the next phase of Koopman-inspired robotics. To support practical adoption, we provide a hands-on tutorial with executable code at https://shorturl.at/ouE59.
comment: Submitted to TRO
The Cesàro Value Iteration
In this paper, we address the problem of undiscouted infinite-horizon optimal control for deterministic systems where the classic value iteration does not converge. For such systems, we propose to use the Ces\`aro mean to define the infinite-horizon optimal control problem and the corresponding infinite-horizon value function. Moreover, for this value function, we introduce the Ces\`aro value iteration and prove its convergence for the special case of systems with periodic optimal operating behavior.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
A complex spatial frequency approach to optimal control of finite-extent linear evolution systems
We consider the linear quadratic regulator (LQR) for one-dimensional linear evolution partial differential equations (PDEs) on a finite interval in space. The control is applied as an additive forcing term to PDEs. Existing methods for closed-form optimal control only apply to homogeneous (zero) boundary conditions, often resulting in series representations. In this paper, we consider general smooth boundary conditions. We use the unified transform, namely the Fourier transform restricted to the bounded spatial domain, to decouple PDEs into a family of ordinary differential equations (ODEs) parameterized by complex spatial frequency variables. Then, optimal control in the frequency domain is derived using LQR theory for ODEs. The inverse Fourier transform leads to non-causal terms in optimal control corresponding to integrals, over the real line, of future values of unspecified boundary conditions. To eliminate this non-causality, we deform the integrals to well-constructed contours in the complex plane along which the contribution of unknowns vanishes. For the reaction-diffusion equation, we show that the integral representation can be reformulated as a series representation, which leads to a state-feedback convolution form for optimal control, with the boundary conditions appearing as an additive term. In numerical experiments, we illustrate the computational advantages of the integral representation in comparison to the series representation and structural properties of the convolution kernel.
comment: 22 pages, 9 figures
Systems and Control (EESS)
Analysis on Energy Efficiency of RIS-Assisted Multiuser Downlink Near-Field Communications
In this paper, we focus on the energy efficiency (EE) optimization and analysis of reconfigurable intelligent surface (RIS)-assisted multiuser downlink near-field communications. Specifically, we conduct a comprehensive study on several key factors affecting EE performance, including the number of RIS elements, the types of reconfigurable elements, reconfiguration resolutions, and the maximum transmit power. To accurately capture the power characteristics of RISs, we adopt more practical power consumption models for three commonly used reconfigurable elements in RISs: PIN diodes, varactor diodes, and radio frequency (RF) switches. These different elements may result in RIS systems exhibiting significantly different energy efficiencies (EEs), even when their spectral efficiencies (SEs) are similar. Considering discrete phases implemented at most RISs in practice, which makes their optimization NP-hard, we develop a nested alternating optimization framework to maximize EE, consisting of an outer integer-based optimization for discrete RIS phase reconfigurations and a nested non-convex optimization for continuous transmit power allocation within each iteration. Extensive comparisons with multiple benchmark schemes validate the effectiveness and efficiency of the proposed framework. Furthermore, based on the proposed optimization method, we analyze the EE performance of RISs across different key factors and identify the optimal RIS architecture yielding the highest EE.
comment: 14 pages, 15 figures. This work has been accepted for publication in IEEE Transactions on Communications
Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons
Many preference elicitation algorithms consider preference over propositional logic formulas or items with different attributes. In sequential decision making, a user's preference can be a preorder over possible outcomes, each of which is a temporal sequence of events. This paper considers a class of preference inference problems where the user's unknown preference is represented by a preorder over regular languages (sets of temporal sequences), referred to as temporal goals. Given a finite set of pairwise comparisons between finite words, the objective is to learn both the set of temporal goals and the preorder over these goals. We first show that a preference relation over temporal goals can be modeled by a Preference Deterministic Finite Automaton (PDFA), which is a deterministic finite automaton augmented with a preorder over acceptance conditions. The problem of preference inference reduces to learning the PDFA. This problem is shown to be computationally challenging, with the problem of determining whether there exists a PDFA of size smaller than a given integer $k$, consistent with the sample, being NP-Complete. We formalize the properties of characteristic samples and develop an algorithm that guarantees to learn, given a characteristic sample, the minimal PDFA equivalent to the true PDFA from which the sample is drawn. We present the method through a running example and provide detailed analysis using a robotic motion planning problem.
comment: 16 pages, 11 figures, technical report, submission under review
Transient Slack Capability
This paper introduces the concept of Transient Slack Capability (TSC), a set of three necessary device-level conditions to ensure stability under sustained power perturbations. TSC states that a device must (1) possess sufficient stored energy; (2) a controlled input power; and (3) maintain internal energy balance and synchronization. The paper shows that the relation among the time-scales of storage, control, and power perturbation is at the core of the TSC concept. Using the port-Hamiltonian (PH) framework, these conditions are formalized and validated via simulations on an adapted model of the WSCC 9-bus system. Case studies demonstrate that TSC is achievable in both Grid-Following (GFL) and Grid-Forming (GFM) converter control schemes, provided the conditions above are satisfied. Sensitivity analysis serves to identify storage and power reserve requirements to meet Conditions 1 and 2; the impact of converter current limiters on Condition 3; and inertia-less solutions able to achieve TSC.
Preliminary Characterization of Bio-inspired Dog-Nose Sampler for Aerosol Detection
Before aerosols can be sensed, sampling technologies must capture the particulate matter of interest. To that end, for systems deployed in open environments where the location of the aerosol is unknown, extending the reach of the sampler could lessen the precision required in sensor placement or reduce the number of sensors required for full spatial coverage. Inspired by the sensitivity of the canine olfactory system, this paper presents a rudimentary sampler that mimics the air flow of a dog's nose. The design consists of speed-controlled inhalation jets, as well as exhalation jets that are angled down and to the side. We tested this design on volatile organic compounds (VOC) in a small number of scenarios to validate the concept and understand how the system behaves. We show that in preliminary testing this dog-nose setup provides improvements over passive and solely inhalation sensing.
comment: 5 pages
Selection Mechanisms for Sequence Modeling using Linear State Space Models
Recent advancements in language modeling tasks have been driven by architectures such as Transformers and, more recently, by Selective State Space Models (SSMs). In this paper, we introduce an alternative selection mechanism inspired by control theory methodologies. Specifically, we propose a novel residual generator for selection, drawing an analogy to fault detection strategies in Linear Time-Invariant (LTI) systems. Unlike Mamba, which utilizes Linear Time-Varying (LTV) systems, our approach combines multiple LTI systems, preserving their beneficial properties during training while achieving comparable selectivity. To evaluate the effectiveness of the proposed architecture, we test its performance on synthetic tasks. While these tasks are not inherently critical, they serve as benchmarks to test the selectivity properties of different cores architecture. This work highlights the potential of integrating theoretical insights with experimental advancements, offering a complementary perspective to deep learning innovations at the intersection of control theory and machine learning.
comment: 9 pages, 5 figures
Sufficient Conditions for Detectability of Approximately Discretized Nonlinear Systems
In many sampled-data applications, observers are designed based on approximately discretized models of continuous-time systems, where usually only the discretized system is analyzed in terms of its detectability. In this paper, we show that if the continuous-time system satisfies certain linear matrix inequality (LMI) conditions, and the sampling period of the discretization scheme is sufficiently small, then the whole family of discretized systems (parameterized by the sampling period) satisfies analogous discrete-time LMI conditions that imply detectability. Our results are applicable to general discretization schemes, as long as they produce approximate models whose linearizations are in some sense consistent with the linearizations of the continuous-time ones. We explicitly show that the Euler and second-order Runge-Kutta methods satisfy this condition. A batch-reactor system example is provided to highlight the usefulness of our results from a practical perspective.
comment: 7 pages, submitted to 23rd European Control Conference
TransDF: Time-Series Forecasting Needs Transformed Label Alignment
Training time-series forecasting models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimization. To address these challenges, we propose Transform-enhanced Direct Forecast (TransDF), which transforms the label sequence into decorrelated components with discriminated significance. Models are trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount. Extensive experiments demonstrate that TransDF achieves state-of-the-art performance and is compatible with various forecasting models. Code is available at https://anonymous.4open.science/r/TransDF-88CF.
Periodic Regulation of Linear Time-Delay Systems via Youla-Kučera Parametrization SC
The paper proposes an alternative way to achieve the Internal Model Principle (IMP) in contrast to the standard way, where a model of the signal one wishes to track/reject is directly substituted into the closed-loop. The proposed alternative approach relies on an already-existing stabilizing controller, which can be further augmented with a Youla-Ku\v{c}era parameter to let it implicitly admit a model of a signal without altering its stabilizing feature. Thus, with the proposed method, the standard design steps of realizing IMP are reversed. The validity and potential of the proposed approach are demonstrated by considering three different types of time-delay systems. It is shown through simulations that all considered unstable systems, despite the infinite-dimensional closed-loop, can be straightforwardly periodically regulated by augmenting their stabilizing PI controllers. Thanks to a specifically chosen structure for the Youla-Ku\v{c}era parameter, the required tuning can be done by solving a set of linear equations.
comment: 6 pages, 8 figures, submitted February 10, 2025, to Joint SSSC-TDS-COSY
HRSim: An agent-based simulation platform for high-capacity ride-sharing services
The rapid growth of ride-sharing services presents a promising solution to urban transportation challenges, such as congestion and carbon emissions. However, developing efficient operational strategies, such as pricing, matching, and fleet management, requires robust simulation tools that can replicate real-world dynamics at scale. Existing platforms often lack the capacity, flexibility, or open-source accessibility needed to support large-scale, high-capacity ride-sharing services. To address these gaps, we introduce HRSim, an open-source, agent-based High-capacity Ride-sharing Simulator. HRSim integrates real-world road networks and demand data to simulate dynamic ride-sharing operations, including pricing, routing, matching, and repositioning. Its module design supports both ride-sharing and solo-hailing service modes. Also, it includes a visualization module for real-time performance analysis. In addition, HRSim incorporates integer linear programming and heuristic algorithms, which can achieve large-scale simulations of high-capacity ride-sharing services. Applications demonstrate HRSim's utility in various perspectives, including quantifying carbon emissions, scaling ride-sharing performance, evaluating new strategies, etc. By bridging the gap between theoretical research and practical implementation, HRSim serves as a versatile testbed for policymakers and transportation network companies to optimize ride-sharing systems for efficiency and sustainability.
Risk-based approach to the Optimal Transmission Switching problem
This paper deals with the secure Optimal Transmission Switching (OTS) problem in situations where the TSO is forced to accept the risk that some contingencies may result in the de-energization of parts of the grid to avoid the violation of operational limits. This operational policy, which mainly applies to subtransmission systems, is first discussed. Then, a model of that policy is proposed that complements the classical MILP model of the N-1 secure OTS problem. It comprises a connectivity and notably a partial grid loss analysis for branch outage contingencies. Finally, its application to the IEEE 14-bus system is presented. Solutions similar to those observed in operation are reached by the algorithm, notably revealing the preventive-openings-cascade phenomenon.
comment: 2025 IEEE PowerTech Kiel
Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory
This research proposes methods for formulating and guaranteeing the resilience of long short-term memory (LSTM) networks, which can serve as a key technology in AI system quality assurance. We introduce a novel methodology applying incremental input-to-state stability ($\delta$ISS) to mathematically define and evaluate the resilience of LSTM against input perturbations. Key achievements include the development of a data-independent evaluation method and the demonstration of resilience control through adjustments to training parameters. This research presents concrete solutions to AI quality assurance from a control theory perspective, which can advance AI applications in control systems.
comment: 8 pages, 6 figures. Appendix: 16 pages. First three listed authors have equal contributions
ExARNN: An Environment-Driven Adaptive RNN for Learning Non-Stationary Power Dynamics
Non-stationary power system dynamics, influenced by renewable energy variability, evolving demand patterns, and climate change, are becoming increasingly complex. Accurately capturing these dynamics requires a model capable of adapting to environmental factors. Traditional models, including Recurrent Neural Networks (RNNs), lack efficient mechanisms to encode external factors, such as time or environmental data, for dynamic adaptation. To address this, we propose the External Adaptive RNN (ExARNN), a novel framework that integrates external data (e.g., weather, time) to continuously adjust the parameters of a base RNN. ExARNN achieves this through a hierarchical hypernetwork design, using Neural Controlled Differential Equations (NCDE) to process external data and generate RNN parameters adaptively. This approach enables ExARNN to handle inconsistent timestamps between power and external measurements, ensuring continuous adaptation. Extensive forecasting tests demonstrate ExARNN's superiority over established baseline models.
comment: 5 pages, 3 figures, conference
Autonomous Circular Drift Control for 4WD-4WS Vehicles Without Precomputed Drifting Equilibrium
Under extreme conditions, autonomous drifting enables vehicles to follow predefined paths at large slip angles, significantly enhancing the control system's capability to handle hazardous scenarios. Four-wheel-drive and four-wheel-steering (4WD-4WS) vehicles, which have been extensively studied, offer superior path-following precision and enhanced maneuverability under challenging driving conditions. In this paper, a hierarchical drifting controller is proposed for 4WD-4WS vehicles to track both path and velocity without relying on precomputed drifting equilibrium. The controller is structured into two layers: a trajectory tracking layer and an actuator regulation layer. The first layer generates the desired tire forces in the vehicle body frame, while the second layer converts these desired tire forces into steering angle commands and torque commands for the front and rear motors. The effectiveness and robustness of the proposed controller are validated through simulation.
A Dynamic Phasor Framework for Analysis of Grid-Forming Converter Connected to Series-Compensated Line
A dynamic phasor (DP) framework for time-domain and frequency-domain analyses of grid-forming converters (GFCs) connected to series-compensated transmission lines is proposed. The proposed framework can capture the behavior of GFCs subjected to unbalanced short circuit faults in presence of different current limiting strategies. Moreover, the linearizability and time invariance of this framework allows us to perform eigen decomposition, which is a powerful tool for root-cause analysis and control design. We show that a certain degree of series compensation may result in poorly-damped oscillations in presence of the grid-forming converter. A participation factor analysis using the DP model reveals that the point of interconnection voltage angle is dominant in this mode. Eigenvalue sensitivity analysis of controller parameters shows that reducing the power-frequency droop coefficient is most effective in stabilizing the poorly-damped mode. Detailed validation with electromagnetic transient model demonstrates the accuracy of the proposed framework.
State of health prediction of lithium-ion batteries for driving conditions based on full parameter domain sparrow search algorithm and dual-module bidirectional gated recurrent unit
Aiming at the state of health (SOH) prediction of lithium-ion batteries (LiBs) for electric vehicles (EVs), this paper proposes a fusion model of a dual-module bidirectional gated recurrent unit (BiGRU) and sparrow search algorithm (SSA) with full parameter domain optimization. With the help of Spearman correlation analysis and ablation experiments, the indirect health indicator (HI) that can characterize the battery degradation is extracted first based on the incremental capacity (IC) curves of the Oxford battery dataset, which simulates the driving conditions. On this basis, the filtered one-dimensional HI is inputted into the dual-module BiGRU for learning the pre- and post-textual information of the input sequence and extracting the sequence features. In order to combine the different hyperparameters in the dual-module BiGRU, SSA is used to optimize the hyperparameters in the full parameter domain. The proposed SSA-BiGRU model combines the advantages and structures of SSA and BiGRU to achieve the highly accurate SOH prediction of LiBs. Studies based on the Oxford battery dataset have shown that the SSA-BiGRU model has higher accuracy, better robustness and generalization ability. Moreover, the proposed SSA-BiGRU model is tested on a real road-driven EV charging dataset and accurate SOH prediction are obtained.
Sampled-data Systems: Stability, Contractivity and Single-iteration Suboptimal MPC
This paper analyzes the stability of interconnected continuous-time (CT) and discrete-time (DT) systems coupled through sampling and zero-order hold mechanisms. The DT system updates its output at regular intervals $T>0$ by applying an $n$-fold composition of a given map. This setup is motivated by online and sampled-data implementations of optimization-based controllers - particularly model predictive control (MPC) - where the DT system models $n$ iterations of an algorithm approximating the solution of an optimization problem. We introduce the concept of a reduced model, defined as the limiting behavior of the sampled-data system as $T \to 0^+$ and $n \to +\infty$. Our main theoretical contribution establishes that when the reduced model is contractive, there exists a threshold duration $T(n)$ for each iteration count $n$ such that the CT-DT interconnection achieves exponential stability for all sampling periods $T < T(n)$. Finally, under the stronger condition that both the CT and DT systems are contractive, we show exponential stability of their interconnection using a small-gain argument. Our theoretical results provide new insights into suboptimal MPC stability, showing that convergence guarantees hold even when using a single iteration of the optimization algorithm - a practically significant finding for real-time control applications.
MorphEUS: Morphable Omnidirectional Unmanned System
Omnidirectional aerial vehicles (OMAVs) have opened up a wide range of possibilities for inspection, navigation, and manipulation applications using drones. In this paper, we introduce MorphEUS, a morphable co-axial quadrotor that can control position and orientation independently with high efficiency. It uses a paired servo motor mechanism for each rotor arm, capable of pointing the vectored-thrust in any arbitrary direction. As compared to the \textit{state-of-the-art} OMAVs, we achieve higher and more uniform force/torque reachability with a smaller footprint and minimum thrust cancellations. The overactuated nature of the system also results in resiliency to rotor or servo-motor failures. The capabilities of this quadrotor are particularly well-suited for contact-based infrastructure inspection and close-proximity imaging of complex geometries. In the accompanying control pipeline, we present theoretical results for full controllability, almost-everywhere exponential stability, and thrust-energy optimality. We evaluate our design and controller on high-fidelity simulations showcasing the trajectory-tracking capabilities of the vehicle during various tasks. Supplementary details and experimental videos are available on the project webpage.
ZV-Sim: Probabilistic Simulation Framework for Pre-emergent Novel Zoonose Tracking
ZV-Sim is an open-source, modular Python framework for probabilistic simulation and analysis of pre-emergent novel zoonotic diseases using pervasive sensing data. It incorporates customizable Human and Animal Presence agents that leverage known and simulated location data, contact networks, and illness reports to assess and predict disease origins and spread. The framework supports Monte Carlo experiments to analyze outcomes with various user-defined movement and probability models. Although initial models are basic and illustrative, ZV-Sim's extensible design facilitates the integration of more sophisticated models as richer data become available, enhancing future capabilities in zoonotic disease tracking. The source code is publicly available \href{https://github.com/jmaff/zv-sim}{\underline{\textit{here}}}.
comment: 5 pages
Gaussian Processes in Power Systems: Techniques, Applications, and Future Works
The increasing integration of renewable energy sources (RESs) and distributed energy resources (DERs) has significantly heightened operational complexity and uncertainty in modern power systems. Concurrently, the widespread deployment of smart meters, phasor measurement units (PMUs) and other sensors has generated vast spatiotemporal data streams, enabling advanced data-driven analytics and decision-making in grid operations. In this context, Gaussian processes (GPs) have emerged as a powerful probabilistic framework, offering uncertainty quantification, non-parametric modeling, and predictive capabilities to enhance power system analysis and control. This paper presents a comprehensive review of GP techniques and their applications in power system operation and control. GP applications are reviewed across three key domains: GP-based modeling, risk assessment, and optimization and control. These areas serve as representative examples of how GP can be utilized in power systems. Furthermore, critical challenges in GP applications are discussed, and potential research directions are outlined to facilitate future power system operations.
Conformal Robust Control of Linear Systems
End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Current approaches of specifying such robust control subproblems, however, rely on hand specification of perturbations anticipated to be present upon deployment or margin methods that ignore problem structure, resulting in a lack of theoretical guarantees and overly conservative empirical performance. We, instead, propose a novel methodology for LQR systems that leverages conformal prediction to specify such uncertainty regions in a data-driven fashion. Such regions have distribution-free coverage guarantees on the true system dynamics, in turn allowing for a probabilistic characterization of the regret of the resulting robust controller. We then demonstrate that such a controller can be efficiently produced via a novel policy gradient method that has convergence guarantees. We finally demonstrate the superior empirical performance of our method over alternate robust control specifications, such as $H_{\infty}$ and LQR with multiplicative noise, across a collection of engineering control systems.
An Operator-Theoretic Framework to Simulate Neuromorphic Circuits
Splitting algorithms are well-established in convex optimization and are designed to solve large-scale problems. Using such algorithms to simulate the behavior of nonlinear circuit networks provides scalable methods for the simulation and design of neuromorphic systems. For circuits made of linear capacitors and inductors with nonlinear resistive elements, we propose a splitting that breaks the network into its LTI lossless component and its static resistive component. This splitting has both physical and algorithmic advantages and allows for separate calculations in the time domain and in the frequency domain. To demonstrate the scalability of this approach, a network made from one hundred neurons modeled by the well-known FitzHugh-Nagumo circuit with all-to-all diffusive coupling is simulated.
comment: Accepted to CDC 2024
Adaptive Input Design for Nonlinear System Identification with Operational Constraints
We consider the problem of joint input design and parameter estimation for identifying nonlinear system models through the sequential acquisition of measurements while adhering to system constraints. We utilize a receding horizon approach and propose a new scale-invariant input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential parameter estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.
Robust Nash equilibrium seeking based on semi-Markov switching topologies
This paper investigates a distributed robust Nash Equilibrium (NE) seeking problem in fluctuating environments. Specifically, the players, subject to the second-order dynamics, are considered to be influenced by external disturbances and uncertain dynamics while communicating via semi-Markov switching topologies. In such constantly changing network circumstances, the existence of disturbances and uncertain dynamics may directly affect the performance of most existing NE seeking algorithms. Moreover, the semi-Markov switching topologies may cause communication uncertainty, which are considered in NE seeking for the first time. To accommodate the above concerns, the following targets require to be reached simultaneously: (1) Disturbances and uncertain dynamics rejection in finite time; (2) Distributed estimation on unknown information required for players' cost functions; (3) A reasonable estimation consensus protocol under semi-Markov switching; (4) NE seeking for the second-order players. By combining supertwisting-based Integral Sliding-Mode Control (ISMC) with average consensus tracking, a novel robust NE seeking algorithm is constructed, incorporating an effective leader-follower consensus protocol. Furthermore, to lessen dispensable information transmission, a sampled-data-based event-triggered mechanism is introduced. Incorporating the advantages of both semi-Markov switching and event-triggered mechanism, another NE seeking algorithm is proposed. Through designing an appropriate Lyapunov-Krasovskii functional, it is shown that the leader-follower consensus can be achieved in the mean-square sense under event-triggered mechanism. Finally, a connectivity control game is formulated to illustrate the validity of the designed algorithms.
comment: This work has been submitted to the lEEE for possible publication
Toward Near-Globally Optimal Nonlinear Model Predictive Control via Diffusion Models
Achieving global optimality in nonlinear model predictive control (NMPC) is challenging due to the non-convex nature of the underlying optimization problem. Since commonly employed local optimization techniques depend on carefully chosen initial guesses, this non-convexity often leads to suboptimal performance resulting from local optima. To overcome this limitation, we propose a novel diffusion model-based approach for near-globally optimal NMPC consisting of an offline and an online phase. The offline phase employs a local optimizer to sample from the distribution of optimal NMPC control sequences along generated system trajectories through random initial guesses. Subsequently, the generated diverse dataset is used to train a diffusion model to reflect the multi-modal distribution of optima. In the online phase, the trained model is leveraged to efficiently perform a variant of random shooting optimization to obtain near-globally optimal control sequences without relying on any initial guesses or online NMPC solving. The effectiveness of our approach is illustrated in a numerical simulation indicating high performance benefits compared to direct neural network approximations of NMPC and significantly lower computation times than online solving NMPC using global optimizers.
comment: This paper has been accepted by the 2025 7th Annual Learning for Dynamics & Control Conference (L4DC) as an oral presentation and has been nominated for the best paper award
The Barzilai-Borwein Method for Distributed Optimization over Unbalanced Directed Networks
This paper studies optimization problems over multi-agent systems, in which all agents cooperatively minimize a global objective function expressed as a sum of local cost functions. Each agent in the systems uses only local computation and communication in the overall process without leaking their private information. Based on the Barzilai-Borwein (BB) method and multi-consensus inner loops, a distributed algorithm with the availability of larger stepsizes and accelerated convergence, namely ADBB, is proposed. Moreover, owing to employing only row-stochastic weight matrices, ADBB can resolve the optimization problems over unbalanced directed networks without requiring the knowledge of neighbors' out-degree for each agent. Via establishing contraction relationships between the consensus error, the optimality gap, and the gradient tracking error, ADBB is theoretically proved to converge linearly to the globally optimal solution. A real-world data set is used in simulations to validate the correctness of the theoretical analysis.
comment: 33 pages, 8 figures
Koopman Operators in Robot Learning
Koopman operator theory offers a rigorous treatment of dynamics and has been emerging as an alternative modeling and learning-based control method across various robotics sub-domains. Due to its ability to represent nonlinear dynamics as a linear (but higher-dimensional) operator, Koopman theory offers a fresh lens through which to understand and tackle the modeling and control of complex robotic systems. Moreover, it enables incremental updates and is computationally inexpensive, thus making it particularly appealing for real-time applications and online active learning. This review delves deeply into the foundations of Koopman operator theory and systematically builds a bridge from theoretical principles to practical robotic applications. We begin by explaining the mathematical underpinnings of the Koopman framework and discussing approximation approaches for incorporating inputs into Koopman-based modeling. Foundational considerations, such as data collection strategies as well as the design of lifting functions for effective system embedding, are also discussed. We then explore how Koopman-based models serve as a unifying tool for a range of robotics tasks, including model-based control, real-time state estimation, and motion planning. The review proceeds to a survey of cutting-edge research that demonstrates the versatility and growing impact of Koopman methods across diverse robotics sub-domains: from aerial and legged platforms to manipulators, soft-bodied systems, and multi-agent networks. A presentation of more advanced theoretical topics, necessary to push forward the overall framework, is included. Finally, we reflect on some key open challenges that remain and articulate future research directions that will shape the next phase of Koopman-inspired robotics. To support practical adoption, we provide a hands-on tutorial with executable code at https://shorturl.at/ouE59.
comment: Submitted to TRO
The Cesàro Value Iteration
In this paper, we address the problem of undiscouted infinite-horizon optimal control for deterministic systems where the classic value iteration does not converge. For such systems, we propose to use the Ces\`aro mean to define the infinite-horizon optimal control problem and the corresponding infinite-horizon value function. Moreover, for this value function, we introduce the Ces\`aro value iteration and prove its convergence for the special case of systems with periodic optimal operating behavior.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
A complex spatial frequency approach to optimal control of finite-extent linear evolution systems
We consider the linear quadratic regulator (LQR) for one-dimensional linear evolution partial differential equations (PDEs) on a finite interval in space. The control is applied as an additive forcing term to PDEs. Existing methods for closed-form optimal control only apply to homogeneous (zero) boundary conditions, often resulting in series representations. In this paper, we consider general smooth boundary conditions. We use the unified transform, namely the Fourier transform restricted to the bounded spatial domain, to decouple PDEs into a family of ordinary differential equations (ODEs) parameterized by complex spatial frequency variables. Then, optimal control in the frequency domain is derived using LQR theory for ODEs. The inverse Fourier transform leads to non-causal terms in optimal control corresponding to integrals, over the real line, of future values of unspecified boundary conditions. To eliminate this non-causality, we deform the integrals to well-constructed contours in the complex plane along which the contribution of unknowns vanishes. For the reaction-diffusion equation, we show that the integral representation can be reformulated as a series representation, which leads to a state-feedback convolution form for optimal control, with the boundary conditions appearing as an additive term. In numerical experiments, we illustrate the computational advantages of the integral representation in comparison to the series representation and structural properties of the convolution kernel.
comment: 22 pages, 9 figures
Robotics
Interactive Post-Training for Vision-Language-Action Models
We introduce RIPT-VLA, a simple and scalable reinforcement-learning-based interactive post-training paradigm that fine-tunes pretrained Vision-Language-Action (VLA) models using only sparse binary success rewards. Existing VLA training pipelines rely heavily on offline expert demonstration data and supervised imitation, limiting their ability to adapt to new tasks and environments under low-data regimes. RIPT-VLA addresses this by enabling interactive post-training with a stable policy optimization algorithm based on dynamic rollout sampling and leave-one-out advantage estimation. RIPT-VLA has the following characteristics. First, it applies to various VLA models, resulting in an improvement on the lightweight QueST model by 21.2%, and the 7B OpenVLA-OFT model to an unprecedented 97.5% success rate. Second, it is computationally efficient and data-efficient: with only one demonstration, RIPT-VLA enables an unworkable SFT model (4%) to succeed with a 97% success rate within 15 iterations. Furthermore, we demonstrate that the policy learned by RIPT-VLA generalizes across different tasks and scenarios and is robust to the initial state context. These results highlight RIPT-VLA as a practical and effective paradigm for post-training VLA models through minimal supervision.
comment: Project page: https://ariostgx.github.io/ript_vla/
CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning
Learning latent motion from Internet videos is crucial for building generalist robots. However, existing discrete latent action methods suffer from information loss and struggle with complex and fine-grained dynamics. We propose CoMo, which aims to learn more informative continuous motion representations from diverse, internet-scale videos. CoMo employs a early temporal feature difference mechanism to prevent model collapse and suppress static appearance noise, effectively discouraging shortcut learning problem. Furthermore, guided by the information bottleneck principle, we constrain the latent motion embedding dimensionality to achieve a better balance between retaining sufficient action-relevant information and minimizing the inclusion of action-irrelevant appearance noise. Additionally, we also introduce two new metrics for more robustly and affordably evaluating motion and guiding motion learning methods development: (i) the linear probing MSE of action prediction, and (ii) the cosine similarity between past-to-current and future-to-current motion embeddings. Critically, CoMo exhibits strong zero-shot generalization, enabling it to generate continuous pseudo actions for previously unseen video domains. This capability facilitates unified policy joint learning using pseudo actions derived from various action-less video datasets (such as cross-embodiment videos and, notably, human demonstration videos), potentially augmented with limited labeled robot data. Extensive experiments show that policies co-trained with CoMo pseudo actions achieve superior performance with both diffusion and autoregressive architectures in simulated and real-world settings.
comment: 18 pages, 7 figures
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation
Out-of-distribution (OOD) detection and segmentation are crucial for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. While prior research has primarily focused on unimodal image data, real-world applications are inherently multimodal, requiring the integration of multiple modalities for improved OOD detection. A key challenge is the lack of supervision signals from unknown data, leading to overconfident predictions on OOD samples. To address this challenge, we propose Feature Mixing, an extremely simple and fast method for multimodal outlier synthesis with theoretical support, which can be further optimized to help the model better distinguish between in-distribution (ID) and OOD data. Feature Mixing is modality-agnostic and applicable to various modality combinations. Additionally, we introduce CARLA-OOD, a novel multimodal dataset for OOD segmentation, featuring synthetic OOD objects across diverse scenes and weather conditions. Extensive experiments on SemanticKITTI, nuScenes, CARLA-OOD datasets, and the MultiOOD benchmark demonstrate that Feature Mixing achieves state-of-the-art performance with a $10 \times$ to $370 \times$ speedup. Our source code and dataset will be available at https://github.com/mona4399/FeatureMixing.
3D Equivariant Visuomotor Policy Learning via Spherical Projection
Equivariant models have recently been shown to improve the data efficiency of diffusion policy by a significant margin. However, prior work that explored this direction focused primarily on point cloud inputs generated by multiple cameras fixed in the workspace. This type of point cloud input is not compatible with the now-common setting where the primary input modality is an eye-in-hand RGB camera like a GoPro. This paper closes this gap by incorporating into the diffusion policy model a process that projects features from the 2D RGB camera image onto a sphere. This enables us to reason about symmetries in SO(3) without explicitly reconstructing a point cloud. We perform extensive experiments in both simulation and the real world that demonstrate that our method consistently outperforms strong baselines in terms of both performance and sample efficiency. Our work is the first SO(3)-equivariant policy learning framework for robotic manipulation that works using only monocular RGB inputs.
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning
We introduce $\infty$-THOR, a new framework for long-horizon embodied tasks that advances long-context understanding in embodied AI. $\infty$-THOR provides: (1) a generation framework for synthesizing scalable, reproducible, and unlimited long-horizon trajectories; (2) a novel embodied QA task, Needle(s) in the Embodied Haystack, where multiple scattered clues across extended trajectories test agents' long-context reasoning ability; and (3) a long-horizon dataset and benchmark suite featuring complex tasks that span hundreds of environment steps, each paired with ground-truth action sequences. To enable this capability, we explore architectural adaptations, including interleaved Goal-State-Action modeling, context extension techniques, and Context Parallelism, to equip LLM-based agents for extreme long-context reasoning and interaction. Experimental results and analyses highlight the challenges posed by our benchmark and provide insights into training strategies and model behaviors under long-horizon conditions. Our work provides a foundation for the next generation of embodied AI systems capable of robust, long-term reasoning and planning.
UAV See, UGV Do: Aerial Imagery and Virtual Teach Enabling Zero-Shot Ground Vehicle Repeat IROS 2025
This paper presents Virtual Teach and Repeat (VirT&R): an extension of the Teach and Repeat (T&R) framework that enables GPS-denied, zero-shot autonomous ground vehicle navigation in untraversed environments. VirT&R leverages aerial imagery captured for a target environment to train a Neural Radiance Field (NeRF) model so that dense point clouds and photo-textured meshes can be extracted. The NeRF mesh is used to create a high-fidelity simulation of the environment for piloting an unmanned ground vehicle (UGV) to virtually define a desired path. The mission can then be executed in the actual target environment by using NeRF-derived point cloud submaps associated along the path and an existing LiDAR Teach and Repeat (LT&R) framework. We benchmark the repeatability of VirT&R on over 12 km of autonomous driving data using physical markings that allow a sim-to-real lateral path-tracking error to be obtained and compared with LT&R. VirT&R achieved measured root mean squared errors (RMSE) of 19.5 cm and 18.4 cm in two different environments, which are slightly less than one tire width (24 cm) on the robot used for testing, and respective maximum errors were 39.4 cm and 47.6 cm. This was done using only the NeRF-derived teach map, demonstrating that VirT&R has similar closed-loop path-tracking performance to LT&R but does not require a human to manually teach the path to the UGV in the actual environment.
comment: 8 pages, 7 figures, submitted to IROS 2025
RealEngine: Simulating Autonomous Driving in Realistic Context
Driving simulation plays a crucial role in developing reliable driving agents by providing controlled, evaluative environments. To enable meaningful assessments, a high-quality driving simulator must satisfy several key requirements: multi-modal sensing capabilities (e.g., camera and LiDAR) with realistic scene rendering to minimize observational discrepancies; closed-loop evaluation to support free-form trajectory behaviors; highly diverse traffic scenarios for thorough evaluation; multi-agent cooperation to capture interaction dynamics; and high computational efficiency to ensure affordability and scalability. However, existing simulators and benchmarks fail to comprehensively meet these fundamental criteria. To bridge this gap, this paper introduces RealEngine, a novel driving simulation framework that holistically integrates 3D scene reconstruction and novel view synthesis techniques to achieve realistic and flexible closed-loop simulation in the driving context. By leveraging real-world multi-modal sensor data, RealEngine reconstructs background scenes and foreground traffic participants separately, allowing for highly diverse and realistic traffic scenarios through flexible scene composition. This synergistic fusion of scene reconstruction and view synthesis enables photorealistic rendering across multiple sensor modalities, ensuring both perceptual fidelity and geometric accuracy. Building upon this environment, RealEngine supports three essential driving simulation categories: non-reactive simulation, safety testing, and multi-agent interaction, collectively forming a reliable and comprehensive benchmark for evaluating the real-world performance of driving agents.
FlashBack: Consistency Model-Accelerated Shared Autonomy
Shared autonomy is an enabling technology that provides users with control authority over robots that would otherwise be difficult if not impossible to directly control. Yet, standard methods make assumptions that limit their adoption in practice-for example, prior knowledge of the user's goals or the objective (i.e., reward) function that they wish to optimize, knowledge of the user's policy, or query-level access to the user during training. Diffusion-based approaches to shared autonomy do not make such assumptions and instead only require access to demonstrations of desired behaviors, while allowing the user to maintain control authority. However, these advantages have come at the expense of high computational complexity, which has made real-time shared autonomy all but impossible. To overcome this limitation, we propose Consistency Shared Autonomy (CSA), a shared autonomy framework that employs a consistency model-based formulation of diffusion. Key to CSA is that it employs the distilled probability flow of ordinary differential equations (PF ODE) to generate high-fidelity samples in a single step. This results in inference speeds significantly than what is possible with previous diffusion-based approaches to shared autonomy, enabling real-time assistance in complex domains with only a single function evaluation. Further, by intervening on flawed actions at intermediate states of the PF ODE, CSA enables varying levels of assistance. We evaluate CSA on a variety of challenging simulated and real-world robot control problems, demonstrating significant improvements over state-of-the-art methods both in terms of task performance and computational efficiency.
Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only
Improving the performance of pre-trained policies through online reinforcement learning (RL) is a critical yet challenging topic. Existing online RL fine-tuning methods require continued training with offline pretrained Q-functions for stability and performance. However, these offline pretrained Q-functions commonly underestimate state-action pairs beyond the offline dataset due to the conservatism in most offline RL methods, which hinders further exploration when transitioning from the offline to the online setting. Additionally, this requirement limits their applicability in scenarios where only pre-trained policies are available but pre-trained Q-functions are absent, such as in imitation learning (IL) pre-training. To address these challenges, we propose a method for efficient online RL fine-tuning using solely the offline pre-trained policy, eliminating reliance on pre-trained Q-functions. We introduce PORL (Policy-Only Reinforcement Learning Fine-Tuning), which rapidly initializes the Q-function from scratch during the online phase to avoid detrimental pessimism. Our method not only achieves competitive performance with advanced offline-to-online RL algorithms and online RL approaches that leverage data or policies prior, but also pioneers a new path for directly fine-tuning behavior cloning (BC) policies.
Perceptual Quality Assessment for Embodied AI
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to predict human preferences for distorted images; however, there is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots. To provide accurate and reliable quality indicators for future embodied scenarios, we first propose the topic: IQA for Embodied AI. Specifically, we (1) based on the Mertonian system and meta-cognitive theory, constructed a perception-cognition-decision-execution pipeline and defined a comprehensive subjective score collection process; (2) established the Embodied-IQA database, containing over 36k reference/distorted image pairs, with more than 5m fine-grained annotations provided by Vision Language Models/Vision Language Action-models/Real-world robots; (3) trained and validated the performance of mainstream IQA methods on Embodied-IQA, demonstrating the need to develop more accurate quality indicators for Embodied AI. We sincerely hope that through evaluation, we can promote the application of Embodied AI under complex distortions in the Real-world. Project page: https://github.com/lcysyzxdxc/EmbodiedIQA
D-LIO: 6DoF Direct LiDAR-Inertial Odometry based on Simultaneous Truncated Distance Field Mapping
This paper presents a new approach for 6DoF Direct LiDAR-Inertial Odometry (D-LIO) based on the simultaneous mapping of truncated distance fields on CPU. Such continuous representation (in the vicinity of the points) enables working with raw 3D LiDAR data online, avoiding the need of LiDAR feature selection and tracking, simplifying the odometry pipeline and easily generalizing to many scenarios. The method is based on the proposed Fast Truncated Distance Field (Fast-TDF) method as a convenient tool to represent the environment. Such representation enables i) solving the LiDAR point-cloud registration as a nonlinear optimization process without the need of selecting/tracking LiDAR features in the input data, ii) simultaneously producing an accurate truncated distance field map of the environment, and iii) updating such map at constant time independently of its size. The approach is tested using open datasets, aerial and ground. It is also benchmarked against other state-of-the-art odometry approaches, demonstrating the same or better level of accuracy with the added value of an online-generated TDF representation of the environment, that can be used for other robotics tasks as planning or collision avoidance. The source code is publicly available at https://anonymous.4open.science/r/D-LIO
comment: 9 pages, 4 figures and 43 references
MEbots: Integrating a RISC-V Virtual Platform with a Robotic Simulator for Energy-aware Design
Virtual Platforms (VPs) enable early software validation of autonomous systems' electronics, reducing costs and time-to-market. While many VPs support both functional and non-functional simulation (e.g., timing, power), they lack the capability of simulating the environment in which the system operates. In contrast, robotics simulators lack accurate timing and power features. This twofold shortcoming limits the effectiveness of the design flow, as the designer can not fully evaluate the features of the solution under development. This paper presents a novel, fully open-source framework bridging this gap by integrating a robotics simulator (Webots) with a VP for RISC-V-based systems (MESSY). The framework enables a holistic, mission-level, energy-aware co-simulation of electronics in their surrounding environment, streamlining the exploration of design configurations and advanced power management policies.
Joint Magnetometer-IMU Calibration via Maximum A Posteriori Estimation
This paper presents a new approach for jointly calibrating magnetometers and inertial measurement units, focusing on improving calibration accuracy and computational efficiency. The proposed method formulates the calibration problem as a maximum a posteriori estimation problem, treating both the calibration parameters and orientation trajectory of the sensors as unknowns. This formulation enables efficient optimization with closed-form derivatives. The method is compared against two state-of-the-art approaches in terms of computational complexity and estimation accuracy. Simulation results demonstrate that the proposed method achieves lower root mean square error in calibration parameters while maintaining competitive computational efficiency. Further validation through real-world experiments confirms the practical benefits of our approach: it effectively reduces position drift in a magnetic field-aided inertial navigation system by more than a factor of two on most datasets. Moreover, the proposed method calibrated 30 magnetometers in less than 2 minutes. The contributions include a new calibration method, an analysis of existing methods, and a comprehensive empirical evaluation. Datasets and algorithms are made publicly available to promote reproducible research.
Monitoring Electrostatic Adhesion Forces via Acoustic Pressure
Electrostatic adhesion is widely used in mobile robotics, haptics, and robotic end effectors for its adaptability to diverse substrates and low energy consumption. Force sensing is important for feedback control, interaction, and monitoring in the EA system. However, EA force monitoring often relies on bulky and expensive sensors, increasing the complexity and weight of the entire system. This paper presents an acoustic-pressure-based method to monitor EA forces without contacting the adhesion pad. When the EA pad is driven by a bipolar square-wave voltage to adhere a conductive object, periodic acoustic pulses arise from the EA system. We employed a microphone to capture these acoustic pressure signals and investigate the influence of peak pressure values. Results show that the peak value of acoustic pressure increased with the mass and contact area of the adhered object, as well as with the amplitude and frequency of the driving voltage. We applied this technique to mass estimation of various objects and simultaneous monitoring of two EA systems. Then, we integrated this technique into an EA end effector that enables monitoring the change of adhered object mass during transport. The proposed technique offers a low-cost, non-contact, and multi-object monitoring solution for EA end effectors in handling tasks.
comment: 6 pages, 7 figures
Safe Uncertainty-Aware Learning of Robotic Suturing
Robot-Assisted Minimally Invasive Surgery is currently fully manually controlled by a trained surgeon. Automating this has great potential for alleviating issues, e.g., physical strain, highly repetitive tasks, and shortages of trained surgeons. For these reasons, recent works have utilized Artificial Intelligence methods, which show promising adaptability. Despite these advances, there is skepticism of these methods because they lack explainability and robust safety guarantees. This paper presents a framework for a safe, uncertainty-aware learning method. We train an Ensemble Model of Diffusion Policies using expert demonstrations of needle insertion. Using an Ensemble model, we can quantify the policy's epistemic uncertainty, which is used to determine Out-Of-Distribution scenarios. This allows the system to release control back to the surgeon in the event of an unsafe scenario. Additionally, we implement a model-free Control Barrier Function to place formal safety guarantees on the predicted action. We experimentally evaluate our proposed framework using a state-of-the-art robotic suturing simulator. We evaluate multiple scenarios, such as dropping the needle, moving the camera, and moving the phantom. The learned policy is robust to these perturbations, showing corrective behaviors and generalization, and it is possible to detect Out-Of-Distribution scenarios. We further demonstrate that the Control Barrier Function successfully limits the action to remain within our specified safety set in the case of unsafe predictions.
Find the Fruit: Designing a Zero-Shot Sim2Real Deep RL Planner for Occlusion Aware Plant Manipulation
This paper presents an end-to-end deep reinforcement learning (RL) framework for occlusion-aware robotic manipulation in cluttered plant environments. Our approach enables a robot to interact with a deformable plant to reveal hidden objects of interest, such as fruits, using multimodal observations. We decouple the kinematic planning problem from robot control to simplify zero-shot sim2real transfer for the trained policy. Our results demonstrate that the trained policy, deployed using our framework, achieves up to 86.7% success in real-world trials across diverse initial conditions. Our findings pave the way toward autonomous, perception-driven agricultural robots that intelligently interact with complex foliage plants to "find the fruit" in challenging occluded scenarios, without the need for explicitly designed geometric and dynamic models of every plant scenario.
comment: 18 Pages, 15 Figures, 5 Tables
ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models
Large Vision-Language Models (LVLMs) have recently advanced robotic manipulation by leveraging vision for scene perception and language for instruction following. However, existing methods rely heavily on costly human-annotated training datasets, which limits their generalization and causes them to struggle in out-of-domain (OOD) scenarios, reducing real-world adaptability. To address these challenges, we propose ManipLVM-R1, a novel reinforcement learning framework that replaces traditional supervision with Reinforcement Learning using Verifiable Rewards (RLVR). By directly optimizing for task-aligned outcomes, our method enhances generalization and physical reasoning while removing the dependence on costly annotations. Specifically, we design two rule-based reward functions targeting key robotic manipulation subtasks: an Affordance Perception Reward to enhance localization of interaction regions, and a Trajectory Match Reward to ensure the physical plausibility of action paths. These rewards provide immediate feedback and impose spatial-logical constraints, encouraging the model to go beyond shallow pattern matching and instead learn deeper, more systematic reasoning about physical interactions.
comment: 13pages
Human-like Semantic Navigation for Autonomous Driving using Knowledge Representation and Large Language Models
Achieving full automation in self-driving vehicles remains a challenge, especially in dynamic urban environments where navigation requires real-time adaptability. Existing systems struggle to handle navigation plans when faced with unpredictable changes in road layouts, spontaneous detours, or missing map data, due to their heavy reliance on predefined cartographic information. In this work, we explore the use of Large Language Models to generate Answer Set Programming rules by translating informal navigation instructions into structured, logic-based reasoning. ASP provides non-monotonic reasoning, allowing autonomous vehicles to adapt to evolving scenarios without relying on predefined maps. We present an experimental evaluation in which LLMs generate ASP constraints that encode real-world urban driving logic into a formal knowledge representation. By automating the translation of informal navigation instructions into logical rules, our method improves adaptability and explainability in autonomous navigation. Results show that LLM-driven ASP rule generation supports semantic-based decision-making, offering an explainable framework for dynamic navigation planning that aligns closely with how humans communicate navigational intent.
comment: 7 pages, 5 figures, submitted for IEEE conference
Unified Multi-Rate Model Predictive Control for a Jet-Powered Humanoid Robot
We propose a novel Model Predictive Control (MPC) framework for a jet-powered flying humanoid robot. The controller is based on a linearised centroidal momentum model to represent the flight dynamics, augmented with a second-order nonlinear model to explicitly account for the slow and nonlinear dynamics of jet propulsion. A key contribution is the introduction of a multi-rate MPC formulation that handles the different actuation rates of the robot's joints and jet engines while embedding the jet dynamics directly into the predictive model. We validated the framework using the jet-powered humanoid robot iRonCub, performing simulations in Mujoco; the simulation results demonstrate the robot's ability to recover from external disturbances and perform stable, non-abrupt flight manoeuvres, validating the effectiveness of the proposed approach.
comment: 8 pages, 6 figures
SpineWave: Harnessing Fish Rigid-Flexible Spinal Kinematics for Enhancing Biomimetic Robotic Locomotion
Fish have endured millions of years of evolution, and their distinct rigid-flexible body structures offer inspiration for overcoming challenges in underwater robotics, such as limited mobility, high energy consumption, and adaptability. This paper introduces SpineWave, a biomimetic robotic fish featuring a fish-spine-like rigid-flexible transition structure. The structure integrates expandable fishbone-like ribs and adjustable magnets, mimicking the stretch and recoil of fish muscles to balance rigidity and flexibility. In addition, we employed an evolutionary algorithm to optimize the hydrodynamics of the robot, achieving significant improvements in swimming performance. Real-world tests demonstrated robustness and potential for environmental monitoring, underwater exploration, and industrial inspection. These tests established SpineWave as a transformative platform for aquatic robotics.
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Reinforcement Learning (RL) can mitigate the causal confusion and distribution shift inherent to imitation learning (IL). However, applying RL to end-to-end autonomous driving (E2E-AD) remains an open problem for its training difficulty, and IL is still the mainstream paradigm in both academia and industry. Recently Model-based Reinforcement Learning (MBRL) have demonstrated promising results in neural planning; however, these methods typically require privileged information as input rather than raw sensor data. We fill this gap by designing Raw2Drive, a dual-stream MBRL approach. Initially, we efficiently train an auxiliary privileged world model paired with a neural planner that uses privileged information as input. Subsequently, we introduce a raw sensor world model trained via our proposed Guidance Mechanism, which ensures consistency between the raw sensor world model and the privileged world model during rollouts. Finally, the raw sensor world model combines the prior knowledge embedded in the heads of the privileged world model to effectively guide the training of the raw sensor policy. Raw2Drive is so far the only RL based end-to-end method on CARLA Leaderboard 2.0, and Bench2Drive and it achieves state-of-the-art performance.
VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving
Reinforcement learning (RL)-based autonomous driving policy learning faces critical limitations such as low sample efficiency and poor generalization; its reliance on online interactions and trial-and-error learning is especially unacceptable in safety-critical scenarios. Existing methods including safe RL often fail to capture the true semantic meaning of "safety" in complex driving contexts, leading to either overly conservative driving behavior or constraint violations. To address these challenges, we propose VL-SAFE, a world model-based safe RL framework with Vision-Language model (VLM)-as-safety-guidance paradigm, designed for offline safe policy learning. Specifically, we construct offline datasets containing data collected by expert agents and labeled with safety scores derived from VLMs. A world model is trained to generate imagined rollouts together with safety estimations, allowing the agent to perform safe planning without interacting with the real environment. Based on these imagined trajectories and safety evaluations, actor-critic learning is conducted under VLM-based safety guidance to optimize the driving policy more safely and efficiently. Extensive evaluations demonstrate that VL-SAFE achieves superior sample efficiency, generalization, safety, and overall performance compared to existing baselines. To the best of our knowledge, this is the first work that introduces a VLM-guided world model-based approach for safe autonomous driving. The demo video and code can be accessed at: https://ys-qu.github.io/vlsafe-website/
TacCompress: A Benchmark for Multi-Point Tactile Data Compression in Dexterous Manipulation
Though robotic dexterous manipulation has progressed substantially recently, challenges like in-hand occlusion still necessitate fine-grained tactile perception, leading to the integration of more tactile sensors into robotic hands. Consequently, the increased data volume imposes substantial bandwidth pressure on signal transmission from the hand's controller. However, the acquisition and compression of multi-point tactile signals based on the dexterous hands' physical structures have not been thoroughly explored. In this paper, our contributions are twofold. First, we introduce a Multi-Point Tactile Dataset for Dexterous Hand Grasping (Dex-MPTD). This dataset captures tactile signals from multiple contact sensors across various objects and grasping poses, offering a comprehensive benchmark for advancing dexterous robotic manipulation research. Second, we investigate both lossless and lossy compression on Dex-MPTD by converting tactile data into images and applying six lossless and five lossy image codecs for efficient compression. Experimental results demonstrate that tactile data can be losslessly compressed to as low as 0.0364 bits per sub-sample (bpss), achieving approximately 200$\times$ compression ratio compared to the raw tactile data. Efficient lossy compressors like HM and VTM can achieve about 1000x data reductions while preserving acceptable data fidelity. The exploration of lossy compression also reveals that screen-content-targeted coding tools outperform general-purpose codecs in compressing tactile data.
comment: 8 pages, 10 figures, 2 tables
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
End-to-end autonomous driving (E2E-AD) demands effective processing of multi-view sensory data and robust handling of diverse and complex driving scenarios, particularly rare maneuvers such as aggressive turns. Recent success of Mixture-of-Experts (MoE) architecture in Large Language Models (LLMs) demonstrates that specialization of parameters enables strong scalability. In this work, we propose DriveMoE, a novel MoE-based E2E-AD framework, with a Scene-Specialized Vision MoE and a Skill-Specialized Action MoE. DriveMoE is built upon our $\pi_0$ Vision-Language-Action (VLA) baseline (originally from the embodied AI field), called Drive-$\pi_0$. Specifically, we add Vision MoE to Drive-$\pi_0$ by training a router to select relevant cameras according to the driving context dynamically. This design mirrors human driving cognition, where drivers selectively attend to crucial visual cues rather than exhaustively processing all visual information. In addition, we add Action MoE by training another router to activate specialized expert modules for different driving behaviors. Through explicit behavioral specialization, DriveMoE is able to handle diverse scenarios without suffering from modes averaging like existing models. In Bench2Drive closed-loop evaluation experiments, DriveMoE achieves state-of-the-art (SOTA) performance, demonstrating the effectiveness of combining vision and action MoE in autonomous driving tasks. We will release our code and models of DriveMoE and Drive-$\pi_0$.
comment: Project Page: https://thinklab-sjtu.github.io/DriveMoE/
Manipulating Elasto-Plastic Objects With 3D Occupancy and Learning-Based Predictive Control
Manipulating elasto-plastic objects remains a significant challenge due to severe self-occlusion, difficulties of representation, and complicated dynamics. This work proposes a novel framework for elasto-plastic object manipulation with a quasi-static assumption for motions, leveraging 3D occupancy to represent such objects, a learned dynamics model trained with 3D occupancy, and a learning-based predictive control algorithm to address these challenges effectively. We build a novel data collection platform to collect full spatial information and propose a pipeline for generating a 3D occupancy dataset. To infer the 3D occupancy during manipulation, an occupancy prediction network is trained with multiple RGB images supervised by the generated dataset. We design a deep neural network empowered by a 3D convolution neural network (CNN) and a graph neural network (GNN) to predict the complex deformation with the inferred 3D occupancy results. A learning-based predictive control algorithm is introduced to plan the robot actions, incorporating a novel shape-based action initialization module specifically designed to improve the planner efficiency. The proposed framework in this paper can successfully shape the elasto-plastic objects into a given goal shape and has been verified in various experiments both in simulation and the real world.
comment: 8 Pages, 5 figures, accepted for publication in IEEE Robotics and Automation Letters (RA-L)
Behavioral Safety Assessment towards Large-scale Deployment of Autonomous Vehicles
Autonomous vehicles (AVs) have significantly advanced in real-world deployment in recent years, yet safety continues to be a critical barrier to widespread adoption. Traditional functional safety approaches, which primarily verify the reliability, robustness, and adequacy of AV hardware and software systems from a vehicle-centric perspective, do not sufficiently address the AV's broader interactions and behavioral impact on the surrounding traffic environment. To overcome this limitation, we propose a paradigm shift toward behavioral safety, a comprehensive approach focused on evaluating AV responses and interactions within the traffic environment. To systematically assess behavioral safety, we introduce a third-party AV safety assessment framework comprising two complementary evaluation components: the Driver Licensing Test and the Driving Intelligence Test. The Driver Licensing Test evaluates the AV's reactive behaviors under controlled scenarios, ensuring basic behavioral competency. In contrast, the Driving Intelligence Test assesses the AV's interactive behaviors within naturalistic traffic conditions, quantifying the frequency of safety-critical events to deliver statistically meaningful safety metrics before large-scale deployment. We validated our proposed framework using Autoware.Universe, an open-source Level 4 AV, tested both in simulated environments and on the physical test track at the University of Michigan's Mcity Testing Facility. The results indicate that Autoware.Universe passed 6 out of 14 scenarios and exhibited a crash rate of 3.01e-3 crashes per mile, approximately 1,000 times higher than the average human driver crash rate. During the tests, we also uncovered several unknown unsafe scenarios for Autoware.Universe. These findings underscore the necessity of behavioral safety evaluations for improving AV safety performance prior to widespread public deployment.
SEM: Enhancing Spatial Understanding for Robust Robot Manipulation
A key challenge in robot manipulation lies in developing policy models with strong spatial understanding, the ability to reason about 3D geometry, object relations, and robot embodiment. Existing methods often fall short: 3D point cloud models lack semantic abstraction, while 2D image encoders struggle with spatial reasoning. To address this, we propose SEM (Spatial Enhanced Manipulation model), a novel diffusion-based policy framework that explicitly enhances spatial understanding from two complementary perspectives. A spatial enhancer augments visual representations with 3D geometric context, while a robot state encoder captures embodiment-aware structure through graphbased modeling of joint dependencies. By integrating these modules, SEM significantly improves spatial understanding, leading to robust and generalizable manipulation across diverse tasks that outperform existing baselines.
EasyInsert: A Data-Efficient and Generalizable Insertion Policy
Insertion task is highly challenging that requires robots to operate with exceptional precision in cluttered environments. Existing methods often have poor generalization capabilities. They typically function in restricted and structured environments, and frequently fail when the plug and socket are far apart, when the scene is densely cluttered, or when handling novel objects. They also rely on strong assumptions such as access to CAD models or a digital twin in simulation. To address this, we propose EasyInsert, a framework which leverages the human intuition that relative pose (delta pose) between plug and socket is sufficient for successful insertion, and employs efficient and automated real-world data collection with minimal human labor to train a generalizable model for relative pose prediction. During execution, EasyInsert follows a coarse-to-fine execution procedure based on predicted delta pose, and successfully performs various insertion tasks. EasyInsert demonstrates strong zero-shot generalization capability for unseen objects in cluttered environments, handling cases with significant initial pose deviations while maintaining high sample efficiency and requiring little human effort. In real-world experiments, with just 5 hours of training data, EasyInsert achieves over 90% success in zero-shot insertion for 13 out of 15 unseen novel objects, including challenging objects like Type-C cables, HDMI cables, and Ethernet cables. Furthermore, with only one human demonstration and 4 minutes of automatically collected data for fine-tuning, it reaches over 90% success rate for all 15 objects.
Tactile-based Reinforcement Learning for Adaptive Grasping under Observation Uncertainties
Robotic manipulation in industrial scenarios such as construction commonly faces uncertain observations in which the state of the manipulating object may not be accurately captured due to occlusions and partial observables. For example, object status estimation during pipe assembly, rebar installation, and electrical installation can be impacted by observation errors. Traditional vision-based grasping methods often struggle to ensure robust stability and adaptability. To address this challenge, this paper proposes a tactile simulator that enables a tactile-based adaptive grasping method to enhance grasping robustness. This approach leverages tactile feedback combined with the Proximal Policy Optimization (PPO) reinforcement learning algorithm to dynamically adjust the grasping posture, allowing adaptation to varying grasping conditions under inaccurate object state estimations. Simulation results demonstrate that the proposed method effectively adapts grasping postures, thereby improving the success rate and stability of grasping tasks.
RE-TRIP : Reflectivity Instance Augmented Triangle Descriptor for 3D Place Recognition
While most people associate LiDAR primarily with its ability to measure distances and provide geometric information about the environment (via point clouds), LiDAR also captures additional data, including reflectivity or intensity values. Unfortunately, when LiDAR is applied to Place Recognition (PR) in mobile robotics, most previous works on LiDAR-based PR rely only on geometric measurements, neglecting the additional reflectivity information that LiDAR provides. In this paper, we propose a novel descriptor for 3D PR, named RE-TRIP (REflectivity-instance augmented TRIangle descriPtor). This new descriptor leverages both geometric measurements and reflectivity to enhance robustness in challenging scenarios such as geometric degeneracy, high geometric similarity, and the presence of dynamic objects. To implement RE-TRIP in real-world applications, we further propose (1) a keypoint extraction method, (2) a key instance segmentation method, (3) a RE-TRIP matching method, and (4) a reflectivity-combined loop verification method. Finally, we conduct a series of experiments to demonstrate the effectiveness of RE-TRIP. Applied to public datasets (i.e., HELIPR, FusionPortable) containing diverse scenarios such as long corridors, bridges, large-scale urban areas, and highly dynamic environments -- our experimental results show that the proposed method outperforms existing state-of-the-art methods in terms of Scan Context, Intensity Scan Context, and STD.
Event-based Reconfiguration Control for Time-varying Formation of Robot Swarms in Narrow Spaces
This study proposes an event-based reconfiguration control to navigate a robot swarm through challenging environments with narrow passages such as valleys, tunnels, and corridors. The robot swarm is modeled as an undirected graph, where each node represents a robot capable of collecting real-time data on the environment and the states of other robots in the formation. This data serves as the input for the controller to provide dynamic adjustments between the desired and straight-line configurations. The controller incorporates a set of behaviors, designed using artificial potential fields, to meet the requirements of goal-oriented motion, formation maintenance, tailgating, and collision avoidance. The stability of the formation control is guaranteed via the Lyapunov theorem. Simulation and comparison results show that the proposed controller not only successfully navigates the robot swarm through narrow spaces but also outperforms other established methods in key metrics including the success rate, heading order, speed, travel time, and energy efficiency. Software-in-the-loop tests have also been conducted to validate the controller's applicability in practical scenarios. The source code of the controller is available at https://github.com/duynamrcv/erc.
UAV Control with Vision-based Hand Gesture Recognition over Edge-Computing
Gesture recognition presents a promising avenue for interfacing with unmanned aerial vehicles (UAVs) due to its intuitive nature and potential for precise interaction. This research conducts a comprehensive comparative analysis of vision-based hand gesture detection methodologies tailored for UAV Control. The existing gesture recognition approaches involving cropping, zooming, and color-based segmentation, do not work well for this kind of applications in dynamic conditions and suffer in performance with increasing distance and environmental noises. We propose to use a novel approach leveraging hand landmarks drawing and classification for gesture recognition based UAV control. With experimental results we show that our proposed method outperforms the other existing methods in terms of accuracy, noise resilience, and efficacy across varying distances, thus providing robust control decisions. However, implementing the deep learning based compute intensive gesture recognition algorithms on the UAV's onboard computer is significantly challenging in terms of performance. Hence, we propose to use a edge-computing based framework to offload the heavier computing tasks, thus achieving closed-loop real-time performance. With implementation over AirSim simulator as well as over a real-world UAV, we showcase the advantage of our end-to-end gesture recognition based UAV control system.
ScanBot: Towards Intelligent Surface Scanning in Embodied Robotic Systems
We introduce ScanBot, a novel dataset designed for instruction-conditioned, high-precision surface scanning in robotic systems. In contrast to existing robot learning datasets that focus on coarse tasks such as grasping, navigation, or dialogue, ScanBot targets the high-precision demands of industrial laser scanning, where sub-millimeter path continuity and parameter stability are critical. The dataset covers laser scanning trajectories executed by a robot across 12 diverse objects and 6 task types, including full-surface scans, geometry-focused regions, spatially referenced parts, functionally relevant structures, defect inspection, and comparative analysis. Each scan is guided by natural language instructions and paired with synchronized RGB, depth, and laser profiles, as well as robot pose and joint states. Despite recent progress, existing vision-language action (VLA) models still fail to generate stable scanning trajectories under fine-grained instructions and real-world precision demands. To investigate this limitation, we benchmark a range of multimodal large language models (MLLMs) across the full perception-planning-execution loop, revealing persistent challenges in instruction-following under realistic constraints.
comment: 17 pages, 11 figures
Construction of an Impedance Control Test Bench
Controlling the physical interaction with the environment or objects, as humans do, is a shared requirement across different types of robots. To effectively control this interaction, it is necessary to control the power delivered to the load, that is, the interaction force and the interaction velocity. However, it is not possible to control these two quantities independently at the same time. An alternative is to control the relation between them, with Impedance and Admittance control, for example. The Impedance Control 2 Dimensions (IC2D) bench is a test bench designed to allow the performance analysis of different actuators and controllers at the joint level. Therefore, it was designed to be as versatile as possible, to allow the combination of linear and/or rotational motions, to use electric and/or hydraulic actuators, with loads known and defined by the user. The bench adheres to a set of requirements defined by the demands of the research group, to be a reliable, backlash-free mechatronic system to validate system dynamics models and controller designs, as well as a valuable experimental setup for benchmarking electric and hydraulic actuators. This article presents the mechanical, electrical, and hydraulic configurations used to ensure the robustness and reliability of the test bench. Benches similar to this one are commonly found in robotics laboratories around the world. However, the IC2D stands out for its versatility and reliability, as well as for supporting hydraulic and electric actuators.
comment: 10 pages, 23 figures
ConvoyNext: A Scalable Testbed Platform for Cooperative Autonomous Vehicle Systems
The advancement of cooperative autonomous vehicle systems depends heavily on effective coordination between multiple agents, aiming to enhance traffic efficiency, fuel economy, and road safety. Despite these potential benefits, real-world testing of such systems remains a major challenge and is essential for validating control strategies, trajectory modeling methods, and communication robustness across diverse environments. To address this need, we introduce ConvoyNext, a scalable, modular, and extensible platform tailored for the real-world evaluation of cooperative driving behaviors. We demonstrate the capabilities of ConvoyNext through a series of experiments involving convoys of autonomous vehicles navigating complex trajectories. These tests highlight the platform's robustness across heterogeneous vehicle configurations and its effectiveness in assessing convoy behavior under varying communication conditions, including intentional packet loss. Our results validate ConvoyNext as a comprehensive, open-access testbed for advancing research in cooperative autonomous vehicle systems.
Navigating Polytopes with Safety: A Control Barrier Function Approach
Collision-free motion is a fundamental requirement for many autonomous systems. This paper develops a safety-critical control approach for the collision-free navigation of polytope-shaped agents in polytope-shaped environments. A systematic method is proposed to generate control barrier function candidates in closed form that lead to controllers with formal safety guarantees. The proposed approach is demonstrated through simulation, with obstacle avoidance examples in 2D and 3D, including dynamically changing environments.
comment: Accepted to the 2025 IEEE Conference on Control Technology and Applications (CCTA). 6 pages, 5 figures
LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios
Recent advances in autonomous driving research towards motion planners that are robust, safe, and adaptive. However, existing rule-based and data-driven planners lack adaptability to long-tail scenarios, while knowledge-driven methods offer strong reasoning but face challenges in representation, control, and real-world evaluation. To address these challenges, we present LiloDriver, a lifelong learning framework for closed-loop motion planning in long-tail autonomous driving scenarios. By integrating large language models (LLMs) with a memory-augmented planner generation system, LiloDriver continuously adapts to new scenarios without retraining. It features a four-stage architecture including perception, scene encoding, memory-based strategy refinement, and LLM-guided reasoning. Evaluated on the nuPlan benchmark, LiloDriver achieves superior performance in both common and rare driving scenarios, outperforming static rule-based and learning-based planners. Our results highlight the effectiveness of combining structured memory and LLM reasoning to enable scalable, human-like motion planning in real-world autonomous driving. Our code is available at https://github.com/Hyan-Yao/LiloDriver.
comment: 7 pages, 3 figures
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Integrating Large Language Models (LLMs) with Reinforcement Learning (RL) can enhance autonomous driving (AD) performance in complex scenarios. However, current LLM-Dominated RL methods over-rely on LLM outputs, which are prone to hallucinations. Evaluations show that state-of-the-art LLM indicates a non-hallucination rate of only approximately 57.95% when assessed on essential driving-related tasks. Thus, in these methods, hallucinations from the LLM can directly jeopardize the performance of driving policies. This paper argues that maintaining relative independence between the LLM and the RL is vital for solving the hallucinations problem. Consequently, this paper is devoted to propose a novel LLM-Hinted RL paradigm. The LLM is used to generate semantic hints for state augmentation and policy optimization to assist RL agent in motion planning, while the RL agent counteracts potential erroneous semantic indications through policy learning to achieve excellent driving performance. Based on this paradigm, we propose the HCRMP (LLM-Hinted Contextual Reinforcement Learning Motion Planner) architecture, which is designed that includes Augmented Semantic Representation Module to extend state space. Contextual Stability Anchor Module enhances the reliability of multi-critic weight hints by utilizing information from the knowledge base. Semantic Cache Module is employed to seamlessly integrate LLM low-frequency guidance with RL high-frequency control. Extensive experiments in CARLA validate HCRMP's strong overall driving performance. HCRMP achieves a task success rate of up to 80.3% under diverse driving conditions with different traffic densities. Under safety-critical driving conditions, HCRMP significantly reduces the collision rate by 11.4%, which effectively improves the driving performance in complex scenarios.
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Vision-Language Models (VLMs) show promise for autonomous driving, yet their struggle with hallucinations, inefficient reasoning, and limited real-world validation hinders accurate perception and robust step-by-step reasoning. To overcome this, we introduce \textbf{AgentThink}, a pioneering unified framework that, for the first time, integrates Chain-of-Thought (CoT) reasoning with dynamic, agent-style tool invocation for autonomous driving tasks. AgentThink's core innovations include: \textbf{(i) Structured Data Generation}, by establishing an autonomous driving tool library to automatically construct structured, self-verified reasoning data explicitly incorporating tool usage for diverse driving scenarios; \textbf{(ii) A Two-stage Training Pipeline}, employing Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to equip VLMs with the capability for autonomous tool invocation; and \textbf{(iii) Agent-style Tool-Usage Evaluation}, introducing a novel multi-tool assessment protocol to rigorously evaluate the model's tool invocation and utilization. Experiments on the DriveLMM-o1 benchmark demonstrate AgentThink significantly boosts overall reasoning scores by \textbf{53.91\%} and enhances answer accuracy by \textbf{33.54\%}, while markedly improving reasoning quality and consistency. Furthermore, ablation studies and robust zero-shot/few-shot generalization experiments across various benchmarks underscore its powerful capabilities. These findings highlight a promising trajectory for developing trustworthy and tool-aware autonomous driving models.
comment: 18 pages, 8 figures
InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
Leveraging pretrained Vision-Language Models (VLMs) to map language instruction and visual observations to raw low-level actions, Vision-Language-Action models (VLAs) hold great promise for achieving general-purpose robotic systems. Despite their advancements, existing VLAs tend to spuriously correlate task-irrelevant visual features with actions, limiting their generalization capacity beyond the training data. To tackle this challenge, we propose Intrinsic Spatial Reasoning (InSpire), a simple yet effective approach that mitigates the adverse effects of spurious correlations by boosting the spatial reasoning ability of VLAs. Specifically, InSpire redirects the VLA's attention to task-relevant factors by prepending the question "In which direction is the [object] relative to the robot?" to the language instruction and aligning the answer "right/left/up/down/front/back/grasped" and predicted actions with the ground-truth. Notably, InSpire can be used as a plugin to enhance existing autoregressive VLAs, requiring no extra training data or interaction with other large models. Extensive experimental results in both simulation and real-world environments demonstrate the effectiveness and flexibility of our approach. Our code, pretrained models and demos are publicly available at: https://Koorye.github.io/proj/Inspire.
Policy Contrastive Decoding for Robotic Foundation Models
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $\pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $\pi_0$ by 8% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
We present GrasMolmo, a generalizable open-vocabulary task-oriented grasping (TOG) model. GraspMolmo predicts semantically appropriate, stable grasps conditioned on a natural language instruction and a single RGB-D frame. For instance, given "pour me some tea", GraspMolmo selects a grasp on a teapot handle rather than its body. Unlike prior TOG methods, which are limited by small datasets, simplistic language, and uncluttered scenes, GraspMolmo learns from PRISM, a novel large-scale synthetic dataset of 379k samples featuring cluttered environments and diverse, realistic task descriptions. We fine-tune the Molmo visual-language model on this data, enabling GraspMolmo to generalize to novel open-vocabulary instructions and objects. In challenging real-world evaluations, GraspMolmo achieves state-of-the-art results, with a 70% prediction success on complex tasks, compared to the 35% achieved by the next best alternative. GraspMolmo also successfully demonstrates the ability to predict semantically correct bimanual grasps zero-shot. We release our synthetic dataset, code, model, and benchmarks to accelerate research in task-semantic robotic manipulation, which, along with videos, are available at https://abhaybd.github.io/GraspMolmo/.
What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study
Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibility. Recently, RL-based policy has emerged as a promising alternative due to its ability to directly map observations to actions, reducing the need for detailed system knowledge and actuation constraints. However, a significant challenge remains in bridging the sim-to-real gap, where RL-based policies often experience instability when deployed in real world. In this paper, we investigate key factors for learning robust RL-based control policies that are capable of zero-shot deployment in real-world quadrotors. We identify five critical factors and we develop a PPO-based training framework named SimpleFlight, which integrates these five techniques. We validate the efficacy of SimpleFlight on Crazyflie quadrotor, demonstrating that it achieves more than a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines. The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors. In contrast, baseline methods struggle with high-speed or infeasible trajectories. To support further research and reproducibility, we integrate SimpleFlight into a GPU-based simulator Omnidrones and provide open-source access to the code and model checkpoints. We hope SimpleFlight will offer valuable insights for advancing RL-based quadrotor control. For more details, visit our project website at https://sites.google.com/view/simpleflight/.
comment: The first two authors contribute equally; Accepted by RA-L
Neural Internal Model Control: Learning a Robust Control Policy via Predictive Error Feedback
Accurate motion control in the face of disturbances within complex environments remains a major challenge in robotics. Classical model-based approaches often struggle with nonlinearities and unstructured disturbances, while RL-based methods can be fragile when encountering unseen scenarios. In this paper, we propose a novel framework, Neural Internal Model Control, which integrates model-based control with RL-based control to enhance robustness. Our framework streamlines the predictive model by applying Newton-Euler equations for rigid-body dynamics, eliminating the need to capture complex high-dimensional nonlinearities. This internal model combines model-free RL algorithms with predictive error feedback. Such a design enables a closed-loop control structure to enhance the robustness and generalizability of the control system. We demonstrate the effectiveness of our framework on both quadrotors and quadrupedal robots, achieving superior performance compared to state-of-the-art methods. Furthermore, real-world deployment on a quadrotor with rope-suspended payloads highlights the framework's robustness in sim-to-real transfer. Our code is released at https://github.com/thu-uav/NeuralIMC.
comment: Accepted by RA-L
GOTPR: General Outdoor Text-based Place Recognition Using Scene Graph Retrieval with OpenStreetMap
We propose GOTPR, a robust place recognition method designed for outdoor environments where GPS signals are unavailable. Unlike existing approaches that use point cloud maps, which are large and difficult to store, GOTPR leverages scene graphs generated from text descriptions and maps for place recognition. This method improves scalability by replacing point clouds with compact data structures, allowing robots to efficiently store and utilize extensive map data. In addition, GOTPR eliminates the need for custom map creation by using publicly available OpenStreetMap data, which provides global spatial information. We evaluated its performance using the KITTI360Pose dataset with corresponding OpenStreetMap data, comparing it to existing point cloud-based place recognition methods. The results show that GOTPR achieves comparable accuracy while significantly reducing storage requirements. In city-scale tests, it completed processing within a few seconds, making it highly practical for real-world robotics applications. More information can be found at https://donghwijung.github.io/GOTPR_page/.
Multi-layer Motion Planning with Kinodynamic and Spatio-Temporal Constraints SC
We propose a novel, multi-layered planning approach for computing paths that satisfy both kinodynamic and spatiotemporal constraints. Our three-part framework first establishes potential sequences to meet spatial constraints, using them to calculate a geometric lead path. This path then guides an asymptotically optimal sampling-based kinodynamic planner, which minimizes an STL-robustness cost to jointly satisfy spatiotemporal and kinodynamic constraints. In our experiments, we test our method with a velocity-controlled Ackerman-car model and demonstrate significant efficiency gains compared to prior art. Additionally, our method is able to generate complex path maneuvers, such as crossovers, something that previous methods had not demonstrated.
comment: Accepted to ACM Hybrid Systems: Computation and Control (HSCC) 2025
Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots
Mobile smartphones compactly provide sensors such as cameras, IMUs, GNSS measurement units, and wireless and wired communication channels required for robotics projects. They are affordable, portable, and programmable, which makes them ideal for testing, data acquisition, controlling mobile robots, and many other robotic applications. A robotic system is proposed in this paper, consisting of an Android phone, a microcontroller board attached to the phone via USB, and a remote wireless controller station. In the data acquisition mode, the Android device can record a dataset of a diverse configuration of multiple cameras, IMUs, GNSS units, and external USB ADC channels in the rawest format used for, but not limited to, pose estimation and scene reconstruction applications. In robot control mode, the Android phone, a microcontroller board, and other peripherals constitute the mobile or stationary robotic system. This system is controlled using a remote server connected over Wi-Fi or Bluetooth. Experiments show that although the SLAM and AR applications can utilize the acquired data, the proposed system can pave the way for more advanced algorithms for processing these noisy and sporadic measurements. Moreover, the characteristics of the communication media are studied, and two example robotic projects, which involve controlling a toy car and a quadcopter, are included.
comment: Project repository: https://github.com/m-dayani/robo-platform Youtube Video: https://youtu.be/BTQ4yLB1bak Dataset: https://drive.google.com/drive/folders/1OZqdA1xa-SyJ64qL_TibqhtwhR1fWWrx?usp=sharing
Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration AAAI 2025
Understanding how humans cooperatively utilize semantic knowledge to explore unfamiliar environments and decide on navigation directions is critical for house service multi-robot systems. Previous methods primarily focused on single-robot centralized planning strategies, which severely limited exploration efficiency. Recent research has considered decentralized planning strategies for multiple robots, assigning separate planning models to each robot, but these approaches often overlook communication costs. In this work, we propose Multimodal Chain-of-Thought Co-Navigation (MCoCoNav), a modular approach that utilizes multimodal Chain-of-Thought to plan collaborative semantic navigation for multiple robots. MCoCoNav combines visual perception with Vision Language Models (VLMs) to evaluate exploration value through probabilistic scoring, thus reducing time costs and achieving stable outputs. Additionally, a global semantic map is used as a communication bridge, minimizing communication overhead while integrating observational results. Guided by scores that reflect exploration trends, robots utilize this map to assess whether to explore new frontier points or revisit history nodes. Experiments on HM3D_v0.2 and MP3D demonstrate the effectiveness of our approach. Our code is available at https://github.com/FrankZxShen/MCoCoNav.git.
comment: 16 pages, 10 figures, Extended Version of accepted AAAI 2025 Paper
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
Dexterous grasping remains a fundamental yet challenging problem in robotics. A general-purpose robot must be capable of grasping diverse objects in arbitrary scenarios. However, existing research typically relies on restrictive assumptions, such as single-object settings or limited environments, leading to constrained generalization. We present DexGraspVLA, a hierarchical framework for general dexterous grasping in cluttered scenes based on RGB image perception and language instructions. It utilizes a pre-trained Vision-Language model as the high-level task planner and learns a diffusion-based policy as the low-level Action controller. The key insight to achieve robust generalization lies in iteratively transforming diverse language and visual inputs into domain-invariant representations via foundation models, where imitation learning can be effectively applied due to the alleviation of domain shift. Notably, our method achieves a 90+% success rate under thousands of unseen object, lighting, and background combinations in a "zero-shot" environment. Empirical analysis confirms the consistency of internal model behavior across environmental variations, thereby validating our design and explaining its generalization performance. DexGraspVLA also demonstrates free-form long-horizon prompt execution, robustness to adversarial objects and human disturbance, and failure recovery, which are rarely achieved simultaneously in prior work. Extended application to nonprehensile object grasping further proves its generality. Code, model, and video are available at dexgraspvla.github.io.
comment: 26 pages, 12 figures
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents
Pre-training vision-language representations on human action videos has emerged as a promising approach to reduce reliance on large-scale expert demonstrations for training embodied agents. However, prior methods often employ time contrastive learning based on goal-reaching heuristics, progressively aligning language instructions from the initial to the final frame. This overemphasis on future frames can result in erroneous vision-language associations, as actions may terminate early or include irrelevant moments in the end. To address this issue, we propose Action Temporal Coherence Learning (AcTOL) to learn ordered and continuous vision-language representations without rigid goal-based constraint. AcTOL treats a video as a continuous trajectory where it (1) contrasts semantic differences between frames to reflect their natural ordering, and (2) imposes a local Brownian bridge constraint to ensure smooth transitions across intermediate frames. Extensive imitation learning experiments on both simulated and real robots show that the pretrained features significantly enhance downstream manipulation tasks with high robustness to different linguistic styles of instructions, offering a viable pathway toward generalized embodied agents.
Strengthening Generative Robot Policies through Predictive World Modeling
We present generative predictive control (GPC), a learning control framework that (i) clones a generative diffusion-based policy from expert demonstrations, (ii) trains a predictive action-conditioned world model from both expert demonstrations and random explorations, and (iii) synthesizes an online planner that ranks and optimizes the action proposals from (i) by looking ahead into the future using the world model from (ii). Across a variety of robotic manipulation tasks, we demonstrate that GPC consistently outperforms behavior cloning in both state-based and vision-based settings, in simulation and in the real world.
comment: Website: https://computationalrobotics.seas.harvard.edu/GPC
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving CVPR 2025
This paper introduces VisionPAD, a novel self-supervised pre-training paradigm designed for vision-centric algorithms in autonomous driving. In contrast to previous approaches that employ neural rendering with explicit depth supervision, VisionPAD utilizes more efficient 3D Gaussian Splatting to reconstruct multi-view representations using only images as supervision. Specifically, we introduce a self-supervised method for voxel velocity estimation. By warping voxels to adjacent frames and supervising the rendered outputs, the model effectively learns motion cues in the sequential data. Furthermore, we adopt a multi-frame photometric consistency approach to enhance geometric perception. It projects adjacent frames to the current frame based on rendered depths and relative poses, boosting the 3D geometric representation through pure image supervision. Extensive experiments on autonomous driving datasets demonstrate that VisionPAD significantly improves performance in 3D object detection, occupancy prediction and map segmentation, surpassing state-of-the-art pre-training strategies by a considerable margin.
comment: Accepted at CVPR 2025
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that the clutch can deliver a peak holding force of 380 N with a 1.48 W consumption, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving
Environment prediction frameworks are critical for the safe navigation of autonomous vehicles (AVs) in dynamic settings. LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation, enabling self-supervised joint scene predictions while exhibiting resilience to partial observability and perception detection failures. Prior approaches have focused on deterministic L-OGM prediction architectures within the grid cell space. While these methods have seen some success, they frequently produce unrealistic predictions and fail to capture the stochastic nature of the environment. Additionally, they do not effectively integrate additional sensor modalities present in AVs. Our proposed framework, Latent Occupancy Prediction (LOPR), performs stochastic L-OGM prediction in the latent space of a generative architecture and allows for conditioning on RGB cameras, maps, and planned trajectories. We decode predictions using either a single-step decoder, which provides high-quality predictions in real-time, or a diffusion-based batch decoder, which can further refine the decoded frames to address temporal consistency issues and reduce compression losses. Our experiments on the nuScenes and Waymo Open datasets show that all variants of our approach qualitatively and quantitatively outperform prior approaches.
Model Identification Adaptive Control with $ρ$-POMDP Planning
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($\rho$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.
comment: Accepted to CoDIT 2025
Safe PDE Boundary Control with Neural Operators
The physical world dynamics are generally governed by underlying partial differential equations (PDEs) with unknown analytical forms in science and engineering problems. Neural network based data-driven approaches have been heavily studied in simulating and solving PDE problems in recent years, but it is still challenging to move forward from understanding to controlling the unknown PDE dynamics. PDE boundary control instantiates a simplified but important problem by only focusing on PDE boundary conditions as the control input and output. However, current model-free PDE controllers cannot ensure the boundary output satisfies some given user-specified safety constraint. To this end, we propose a safety filtering framework to guarantee the boundary output stays within the safe set for current model-free controllers. Specifically, we first introduce a neural boundary control barrier function (BCBF) to ensure the feasibility of the trajectory-wise constraint satisfaction of boundary output. Based on the neural operator modeling the transfer function from boundary control input to output trajectories, we show that the change in the BCBF depends linearly on the change in input boundary, so quadratic programming-based safety filtering can be done for pre-trained model-free controllers. Extensive experiments under challenging hyperbolic, parabolic and Navier-Stokes PDE dynamics environments validate the plug-and-play effectiveness of the proposed method by achieving better general performance and boundary constraint satisfaction compared to the vanilla and constrained model-free controller baselines. The code is available at https://github.com/intelligent-control-lab/safe-pde-control.
comment: L4DC 2025 full version, 28 pages, 5 figures, 8 tables
Multiagent Systems
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
LLM-based multi-agent systems (MAS) extend the capabilities of single LLMs by enabling cooperation among multiple specialized agents. However, most existing MAS frameworks rely on a single LLM to drive all agents, constraining the system's intelligence to the limit of that model. This paper explores the paradigm of heterogeneous LLM-driven MAS (X-MAS), where agents are powered by diverse LLMs, elevating the system's potential to the collective intelligence of diverse LLMs. We introduce X-MAS-Bench, a comprehensive testbed designed to evaluate the performance of various LLMs across different domains and MAS-related functions. As an extensive empirical study, we assess 27 LLMs across 5 domains (encompassing 21 test sets) and 5 functions, conducting over 1.7 million evaluations to identify optimal model selections for each domain-function combination. Building on these findings, we demonstrate that transitioning from homogeneous to heterogeneous LLM-driven MAS can significantly enhance system performance without requiring structural redesign. Specifically, in a chatbot-only MAS scenario, the heterogeneous configuration yields up to 8.4\% performance improvement on the MATH dataset. In a mixed chatbot-reasoner scenario, the heterogeneous MAS could achieve a remarkable 47\% performance boost on the AIME dataset. Our results underscore the transformative potential of heterogeneous LLMs in MAS, highlighting a promising avenue for advancing scalable, collaborative AI systems.
comment: 19 pages, 5 figures
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems
LLM-based multi-agent systems (MAS) have demonstrated significant potential in enhancing single LLMs to address complex and diverse tasks in practical applications. Despite considerable advancements, the field lacks a unified codebase that consolidates existing methods, resulting in redundant re-implementation efforts, unfair comparisons, and high entry barriers for researchers. To address these challenges, we introduce MASLab, a unified, comprehensive, and research-friendly codebase for LLM-based MAS. (1) MASLab integrates over 20 established methods across multiple domains, each rigorously validated by comparing step-by-step outputs with its official implementation. (2) MASLab provides a unified environment with various benchmarks for fair comparisons among methods, ensuring consistent inputs and standardized evaluation protocols. (3) MASLab implements methods within a shared streamlined structure, lowering the barriers for understanding and extension. Building on MASLab, we conduct extensive experiments covering 10+ benchmarks and 8 models, offering researchers a clear and comprehensive view of the current landscape of MAS methods. MASLab will continue to evolve, tracking the latest developments in the field, and invite contributions from the broader open-source community.
comment: 18 pages, 11 figures
Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning
The rise of Large Language Model-based Multi-Agent Planning has leveraged advanced frameworks to enable autonomous and collaborative task execution. Some systems rely on platforms like review sites and social media, which are prone to fraudulent information, such as fake reviews or misleading descriptions. This reliance poses risks, potentially causing financial losses and harming user experiences. To evaluate the risk of planning systems in real-world applications, we introduce \textbf{WandaPlan}, an evaluation environment mirroring real-world data and injected with deceptive content. We assess system performance across three fraud cases: Misinformation Fraud, Team-Coordinated Multi-Person Fraud, and Level-Escalating Multi-Round Fraud. We reveal significant weaknesses in existing frameworks that prioritize task efficiency over data authenticity. At the same time, we validate WandaPlan's generalizability, capable of assessing the risks of real-world open-source planning frameworks. To mitigate the risk of fraud, we propose integrating an anti-fraud agent, providing a solution for reliable planning.
Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design
Single-agent LLMs hit hard limits--finite context, role overload, and brittle domain transfer. Conventional multi-agent fixes soften those edges yet expose fresh pains: ill-posed decompositions, fuzzy contracts, and verification overhead that blunts the gains. We therefore present Know-The-Ropes (KtR), a framework that converts domain priors into an algorithmic blueprint hierarchy, in which tasks are recursively split into typed, controller-mediated subtasks, each solved zero-shot or with the lightest viable boost (e.g., chain-of-thought, micro-tune, self-check). Grounded in the No-Free-Lunch theorem, KtR trades the chase for a universal prompt for disciplined decomposition. On the Knapsack problem (3-8 items), three GPT-4o-mini agents raise accuracy from 3% zero-shot to 95% on size-5 instances after patching a single bottleneck agent. On the tougher Task-Assignment problem (6-15 jobs), a six-agent o3-mini blueprint hits 100% up to size 10 and 84% on sizes 13-15, versus 11% zero-shot. Algorithm-aware decomposition plus targeted augmentation thus turns modest models into reliable collaborators--no ever-larger monoliths required.
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use NAACL 2025
Enterprise customers are increasingly adopting Large Language Models (LLMs) for critical communication tasks, such as drafting emails, crafting sales pitches, and composing casual messages. Deploying such models across different regions requires them to understand diverse cultural and linguistic contexts and generate safe and respectful responses. For enterprise applications, it is crucial to mitigate reputational risks, maintain trust, and ensure compliance by effectively identifying and handling unsafe or offensive language. To address this, we introduce SweEval, a benchmark simulating real-world scenarios with variations in tone (positive or negative) and context (formal or informal). The prompts explicitly instruct the model to include specific swear words while completing the task. This benchmark evaluates whether LLMs comply with or resist such inappropriate instructions and assesses their alignment with ethical frameworks, cultural nuances, and language comprehension capabilities. In order to advance research in building ethically aligned AI systems for enterprise use and beyond, we release the dataset and code: https://github.com/amitbcp/multilingual_profanity.
comment: Published in the Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025), Industry Track, pages 558-582
Distribution through Repeated Market with Buying Rights
Resource distribution is a fundamental problem in economic and policy design, particularly when demand and supply are not naturally aligned. Without regulation, wealthier individuals may monopolize this resource, leaving the needs of others unsatisfied. While centralized distribution can ensure fairer division, it can struggle to manage logistics efficiently, and adapt to changing conditions, often leading to shortages, surpluses, and bureaucratic inefficiencies. Building on previous research on market-based redistribution, we examine a repeated hybrid market that incorporates buying rights. These rights, distributed iteratively by a central authority (for instance, as digital tokens), are intended to enhance fairness in the system - a unit of right is required to acquire a unit of the resource, but the rights themselves can also be traded alongside the resource in the market. We analyze how this regulatory mechanism influences the distribution of the scarce resource in the hybrid market over time. Unlike past works that relied on empirical methods, we explore the exact analytical properties of a system in which traders optimize over multiple rounds. We identify its market equilibrium, which is a natural generalization of the free market equilibrium, and show that it is coalition-proof. To assess the fairness in the system, we use the concept of frustration, which measures the gap between the resources a buyer is entitled to through their buying rights and what they actually obtain through trading. Our main theoretical result shows that using buying rights reduces the frustration by at least half compared to the free market. Empirical evaluations further support our findings, suggesting the system performs well even beyond the theoretically studied assumptions.
Let's Get You Hired: A Job Seeker's Perspective on Multi-Agent Recruitment Systems for Explaining Hiring Decisions
During job recruitment, traditional applicant selection methods often lack transparency. Candidates are rarely given sufficient justifications for recruiting decisions, whether they are made manually by human recruiters or through the use of black-box Applicant Tracking Systems (ATS). To address this problem, our work introduces a multi-agent AI system that uses Large Language Models (LLMs) to guide job seekers during the recruitment process. Using an iterative user-centric design approach, we first conducted a two-phased exploratory study with four active job seekers to inform the design and development of the system. Subsequently, we conducted an in-depth, qualitative user study with 20 active job seekers through individual one-to-one interviews to evaluate the developed prototype. The results of our evaluation demonstrate that participants perceived our multi-agent recruitment system as significantly more actionable, trustworthy, and fair compared to traditional methods. Our study further helped us uncover in-depth insights into factors contributing to these perceived user experiences. Drawing from these insights, we offer broader design implications for building user-aligned, multi-agent explainable AI systems across diverse domains.
comment: Pre-print version only. Please check the published version for any reference or citation
Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System
Meta-analysis is a systematic research methodology that synthesizes data from multiple existing studies to derive comprehensive conclusions. This approach not only mitigates limitations inherent in individual studies but also facilitates novel discoveries through integrated data analysis. Traditional meta-analysis involves a complex multi-stage pipeline including literature retrieval, paper screening, and data extraction, which demands substantial human effort and time. However, while LLM-based methods can accelerate certain stages, they still face significant challenges, such as hallucinations in paper screening and data extraction. In this paper, we propose a multi-agent system, Manalyzer, which achieves end-to-end automated meta-analysis through tool calls. The hybrid review, hierarchical extraction, self-proving, and feedback checking strategies implemented in Manalyzer significantly alleviate these two hallucinations. To comprehensively evaluate the performance of meta-analysis, we construct a new benchmark comprising 729 papers across 3 domains, encompassing text, image, and table modalities, with over 10,000 data points. Extensive experiments demonstrate that Manalyzer achieves significant performance improvements over the LLM baseline in multi meta-analysis tasks. Project page: https://black-yt.github.io/meta-analysis-page/ .
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL
Adaptability to specialised or homogeneous behaviours is critical in cooperative multi-agent reinforcement learning (MARL). Parameter sharing (PS) techniques, common for efficient adaptation, often limit behavioural diversity due to cross-agent gradient interference, which we show can be exacerbated by the coupling of observations and agent IDs. Current remedies typically add complexity through altered objectives, manual preset diversity levels, or sequential updates. We ask: can shared policies adapt without these complexities? We propose HyperMARL, a PS approach using hypernetworks for dynamic agent-specific parameters, without altering the RL objective or requiring preset diversity levels. HyperMARL's explicit decoupling of observation- and agent-conditioned gradients empirically reduces policy gradient variance, facilitates shared-policy adaptation (including specialisation), and helps mitigate cross-agent interference. Across diverse MARL benchmarks (up to 20 agents), requiring homogeneous, heterogeneous, or mixed behaviours, HyperMARL achieves competitive performance against key baselines -- fully shared, non-parameter sharing, and three diversity-promoting methods -- while preserving behavioural diversity comparable to non-parameter sharing. These findings establish HyperMARL as a versatile approach for adaptive MARL. The code is publicly available at https://github.com/KaleabTessera/HyperMARL.
Multi-Agent Corridor Generating Algorithm
In this paper, we propose the Multi-Agent Corridor Generating Algorithm (MACGA) for solving the Multi-agent Pathfinding (MAPF) problem, where a group of agents need to find non-colliding paths to their target locations. Existing approaches struggle to solve dense MAPF instances. In MACGA, the agents build \emph{corridors}, which are sequences of connected vertices, from current locations towards agents' goals, and evacuate other agents out of the corridors to avoid collisions and deadlocks. We also present the MACGA+PIBT algorithm, which integrates the well-known rule-based PIBT algorithm into MACGA to improve runtime and solution quality. The proposed algorithms run in polynomial time and have a reachability property, i.e., every agent is guaranteed to reach its goal location at some point. We demonstrate experimentally that MACGA and MACGA+PIBT outperform baseline algorithms in terms of success rate, runtime, and makespan across diverse MAPF benchmark grids.
Homotopy-Aware Multi-Agent Path Planning on Plane
We propose an efficient framework using Dynnikov coordinates for homotopy-aware multi-agent path planning in planar domains that may contain obstacles. We developed a method for generating multiple homotopically distinct solutions for the multi-agent path planning problem in planar domains by combining our framework with revised prioritized planning and proved its completeness under specific assumptions. Experimentally, we demonstrated that our method is significantly faster than a method without Dynnikov coordinates. We also confirmed experimentally that homotopy-aware planning contributes to avoiding locally optimal solutions when searching for low-cost trajectories for a swarm of agents in a continuous environment.
comment: 21 pages with 5 pages of references and appendices, 23 figures
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
Credit assignment has remained a fundamental challenge in multi-agent reinforcement learning (MARL). Previous studies have primarily addressed this issue through value decomposition methods under the centralized training with decentralized execution paradigm, where neural networks are utilized to approximate the nonlinear relationship between individual Q-values and the global Q-value. Although these approaches have achieved considerable success in various benchmark tasks, they still suffer from several limitations, including imprecise attribution of contributions, limited interpretability, and poor scalability in high-dimensional state spaces. To address these challenges, we propose a novel algorithm, \textbf{QLLM}, which facilitates the automatic construction of credit assignment functions using large language models (LLMs). Specifically, the concept of \textbf{TFCAF} is introduced, wherein the credit allocation process is represented as a direct and expressive nonlinear functional formulation. A custom-designed \textit{coder-evaluator} framework is further employed to guide the generation, verification, and refinement of executable code by LLMs, significantly mitigating issues such as hallucination and shallow reasoning during inference. Extensive experiments conducted on several standard MARL benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art baselines. Moreover, QLLM exhibits strong generalization capability and maintains compatibility with a wide range of MARL algorithms that utilize mixing networks, positioning it as a promising and versatile solution for complex multi-agent scenarios.
comment: 17 pages, 10 figures, 5 tables
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents
Large language models (LLMs) based agent systems have made great strides in real-world applications beyond traditional NLP tasks. This paper proposes a new LLM-powered Multi-Agent System (LLM-MAS) benchmark, Collab-Overcooked, built on the popular Overcooked-AI game with more applicable and challenging tasks in interactive environments. Collab-Overcooked extends existing benchmarks from two novel perspectives. First, it provides a multi-agent framework supporting diverse tasks and objectives and encourages collaboration through natural language communication. Second, it introduces a spectrum of process-oriented evaluation metrics to assess the fine-grained collaboration capabilities of different LLM agents, a dimension often overlooked in prior work. We conduct extensive experiments over 11 popular LLMs and show that, while the LLMs present a strong ability in goal interpretation, there is a significant discrepancy in active collaboration and continuous adaptation which are critical for efficiently fulfilling complicated tasks. Notably, we highlight the strengths and weaknesses in LLM-MAS and provide insights for improving and evaluating LLM-MAS on a unified and open-sourced benchmark. The environments, 30 open-ended tasks, and the evaluation package are publicly available at https://github.com/YusaeMeow/Collab-Overcooked.
comment: 30 pages, 17 figures
Transforming the Hybrid Cloud for Emerging AI Workloads
This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co-design approaches, emphasizing usability, manageability, affordability, adaptability, efficiency, and scalability. By integrating cutting-edge technologies such as generative and agentic AI, cross-layer automation and optimization, unified control plane, and composable and adaptive system architecture, the proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. Incorporating quantum computing as it matures will enable quantum-accelerated simulations for materials science, climate modeling, and other high-impact domains. Collaborative efforts between academia and industry are central to this vision, driving advancements in foundation models for material design and climate solutions, scalable multimodal data processing, and enhanced physics-based AI emulators for applications like weather forecasting and carbon sequestration. Research priorities include advancing AI agentic systems, LLM as an Abstraction (LLMaaA), AI model optimization and unified abstractions across heterogeneous infrastructure, end-to-end edge-cloud transformation, efficient programming model, middleware and platform, secure infrastructure, application-adaptive cloud systems, and new quantum-classical collaborative workflows. These ideas and solutions encompass both theoretical and practical research questions, requiring coordinated input and support from the research community. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms, fostering breakthroughs in AI-driven applications and scientific discovery across academia, industry, and society.
comment: 70 pages, 27 figures
Systems and Control (CS)
Delayed dynamic-feedback controller design for multi-frequency vibration suppression
We present a methodology for designing a dynamic controller with delayed output feedback for achieving non-collocated vibration suppression with a focus on the multi-frequency case. To synthesize the delay-based controller, we first remodel the system of equations as a delay-differential algebraic equation (DDAE) in such a way that existing tools for design of a static output feedback controller can be easily adapted. The problem of achieving non-collocated vibration suppression with sufficient damping is formulated as a constrained optimization problem of minimizing the spectral abscissa in the presence of zero-location constraints, with the constraints exhibiting polynomial dependence on its parameters. We transform the problem into an unconstrained one using elimination, following which we solve the resulting non-convex, non-smooth optimization problem.
comment: 7 pages, 4 figures, submitted April 24, 2024, to Time Delay Systems (TDS 2024)
Modeling and Constraint-Aware Control of Pressure Dynamics in Water Electrolysis Systems
This paper addresses the challenge of pressure constraint violations in water electrolysis systems operating under dynamic power conditions, a problem common to both Proton Exchange Membrane and alkaline technologies. To investigate this issue, a control-oriented model of an alkaline electrolyzer is developed, capturing key pressure and flow dynamics. To manage rapid power fluctuations that may cause pressure to exceed manufacturer-defined operational boundaries, a model-based constraint-aware power governor based on the Reference Governor (RG) framework is proposed. Simulation results show that the strategy effectively maintains pressure within the specified operating range, outperforming conventional filtering methods while enhancing hydrogen production and reducing auxiliary energy consumption.
On the Deployment of RIS-mounted UAV Networks
Reconfigurable intelligent surfaces (RIS) enable smart wireless environments by dynamically controlling signal propagation to enhance communication and localization. Unmanned aerial vehicles (UAVs) can act as flying base stations and thus, improve system performance by avoiding signal blockages. In this paper, we propose a gradient ascent and coordinate search based method to determine the optimal location for a system that consists of a UAV and a RIS, where the UAV serves cellular users (CUs) and the RIS serves device-to-device (D2D) pairs. In particular, by optimizing the net throughput for both the D2D pairs and the CUs, the suggested method establishes the ideal location for the RIS-mounted UAV. We consider both line of sight (LoS) and non-LoS paths for the RIS and UAV to calculate the throughput while accounting for blockages in the system. The numerical results show that the proposed method performs better than the existing approaches in terms of both the net throughput and the user fairness.
comment: Submitted for a possible publication
MEbots: Integrating a RISC-V Virtual Platform with a Robotic Simulator for Energy-aware Design
Virtual Platforms (VPs) enable early software validation of autonomous systems' electronics, reducing costs and time-to-market. While many VPs support both functional and non-functional simulation (e.g., timing, power), they lack the capability of simulating the environment in which the system operates. In contrast, robotics simulators lack accurate timing and power features. This twofold shortcoming limits the effectiveness of the design flow, as the designer can not fully evaluate the features of the solution under development. This paper presents a novel, fully open-source framework bridging this gap by integrating a robotics simulator (Webots) with a VP for RISC-V-based systems (MESSY). The framework enables a holistic, mission-level, energy-aware co-simulation of electronics in their surrounding environment, streamlining the exploration of design configurations and advanced power management policies.
Data Center Model for Transient Stability Analysis of Power Systems
The rising demand of computing power leads to the installation of a large number of Data Centers (DCs). Their Fault-Ride-Through (FRT) behavior and their unique power characteristics, especially for DCs catered to Artificial Intelligence (AI) workloads, pose a threat to the stability of power systems. To ensure its stability, it is required accurate models of the loads involved. Here we propose a dynamic load model that properly captures the behaviour of DCs. Its three most defining features are the use of an Uninterrupted Power Supply (UPS) which sits between the server load and the grid, the cooling load represented by an induction motor, and a pulsing load that represents the transients caused by contemporary DCs with significant AI workloads. The features of the proposed model and its impact on the dynamic performance of transmission systems are illustrated through a model of the all-island Irish transmission system and real-world data of the DCs currently connected to this system.
Neural network based control of unknown nonlinear systems via contraction analysis
This paper studies the design of neural network (NN)-based controllers for unknown nonlinear systems, using contraction analysis. A Neural Ordinary Differential Equation (NODE) system is constructed by approximating the unknown draft dynamics with a feedforward NN. Incremental sector bounds and contraction theory are applied to the activation functions and the weights of the NN, respectively. It is demonstrated that if the incremental sector bounds and the weights satisfy some non-convex conditions, the NODE system is contractive. To improve computational efficiency, these non-convex conditions are reformulated as convex LMI conditions. Additionally, it is proven that when the NODE system is contractive, the trajectories of the original autonomous system converge to a neighborhood of the unknown equilibrium, with the size of this neighborhood determined by the approximation error. For a single-layer NN, the NODE system is simplified to a continuous-time Hopfield NN. If the NODE system does not satisfy the contraction conditions, an NN-based controller is designed to enforce contractivity. This controller integrates a linear component, which ensures contraction through suitable control gains, and an NN component, which compensates for the NODE system's nonlinearities. This integrated controller guarantees that the trajectories of the original affine system converge to a neighborhood of the unknown equilibrium. The effectiveness of the proposed approach is demonstrated through two illustrative examples.
SpineWave: Harnessing Fish Rigid-Flexible Spinal Kinematics for Enhancing Biomimetic Robotic Locomotion
Fish have endured millions of years of evolution, and their distinct rigid-flexible body structures offer inspiration for overcoming challenges in underwater robotics, such as limited mobility, high energy consumption, and adaptability. This paper introduces SpineWave, a biomimetic robotic fish featuring a fish-spine-like rigid-flexible transition structure. The structure integrates expandable fishbone-like ribs and adjustable magnets, mimicking the stretch and recoil of fish muscles to balance rigidity and flexibility. In addition, we employed an evolutionary algorithm to optimize the hydrodynamics of the robot, achieving significant improvements in swimming performance. Real-world tests demonstrated robustness and potential for environmental monitoring, underwater exploration, and industrial inspection. These tests established SpineWave as a transformative platform for aquatic robotics.
Robust Look-ahead Pursuit Control for Three-Dimensional Path Following within Finite-Time Stability Guarantee
This paper addresses the challenging problem of robust path following for fixed-wing unmanned aerial vehicles (UAVs) in complex environments with bounded external disturbances and non-smooth predefined paths. Due to the unique aerodynamic characteristics and flight constraints of fixed-wing UAVs, achieving accurate and stable path following remains difficult, especially in low-altitude mountainous terrains, urban landscapes, and under wind disturbances. Traditional path-following guidance laws often struggle with rapid stabilization and constrained input commands under unknown disturbances while maintaining robustness. To overcome these limitations, we propose a robust nonlinear path-following guidance law that considers the flight path angle and track angle, and dynamically adjusts controller parameters to achieve optimal compensation for acceleration increments. The proposed guidance law guarantees finite-time stability, reduced sensitivity to constrained uncertainties, and consistent behavior compared to traditional asymptotic convergence controllers. Additionally, it ensures that the UAV approaches mobile virtual target points in the shortest possible time while adhering to input constrained conditions. Our contributions include a thorough analysis of the conditions for robust stability, the derivation of the guidance law, and simulations demonstrating its effectiveness. The results show that the proposed guidance law significantly improves path-following performance under external disturbances, making it a promising solution for autonomous missions execution of fixed-wing UAVs.
comment: 16 pages, 14 figures
Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach
Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA) scheme, aiming at precisely introducing the extent of poisoning in a subtle controlled manner. It operates with a predefined objective, such as reducing global model's prediction accuracy by 10\%. FedSA integrates robust nonlinear control-Sliding Mode Control (SMC) theory with model poisoning attacks. It can manipulate the updates from malicious clients to drive the global model towards a compromised state, achieving this at a controlled and inconspicuous rate. Additionally, leveraging the robust control properties of FedSA allows precise control over the convergence bounds, enabling the attacker to set the global accuracy of the poisoned model to any desired level. Experimental results demonstrate that FedSA can accurately achieve a predefined global accuracy with fewer malicious clients while maintaining a high level of stealth and adjustable learning rates.
Trajectory-Independent Flexibility Envelopes of Energy-Constrained Systems with State-Dependent Losses
As non-dispatchable renewable power units become prominent in electric power grids, demand-side flexibility appears as a key element of future power systems' operation. Power and energy bounds are intuitive metrics to describe the flexibility of energy-constrained loads. However, to be used in operation, any power consumption trajectory fulfilling the power and energy bounds must necessarily fulfill the load's constraints. In this paper, we demonstrate that energy bounds defined as the minimum and maximum energy consumption potential of a load with state-dependent losses are Trajectory-Dependent (TD), i.e., for any energy value in the bounds a feasible power trajectory exists, but not all power trajectories enclosed in the energy envelopes satisfy the load's constraints. To guarantee the satisfaction of load constraints for all trajectories, we define Trajectory-Independent (TI) energy bounds. We present TI envelope formulations for individual loads, as well as physically coupled loads and assess the proposed formulations in a building heating system, a system with state-dependent losses. We find that using a TD envelope as energy bounds in operation may yield room temperature up to 3.8{\deg}C higher and 3.4{\deg}C lower than admissible. Overall, poorly insulated buildings observe a TI energy envelope that differs significantly from their TD envelope.
comment: 10 pages
Coverage Path Planning For Multi-view SAR-UAV Observation System Under Energy Constraint
Multi-view Synthetic Aperture Radar (SAR) imaging can effectively enhance the performance of tasks such as automatic target recognition and image information fusion. Unmanned aerial vehicles (UAVs) have the advantages of flexible deployment and cost reduction. A swarm of UAVs equipped with synthetic aperture radar imaging equipment is well suited to meet the functional requirements of multi-view synthetic aperture radar imaging missions. However, to provide optimal paths for SAR-UAVs from the base station to cover target viewpoints in the mission area is of NP-hard computational complexity. In this work, the coverage path planning problem for multi-view SAR-UAV observation systems is studied. First, the coordinate of observation viewpoints is calculated based on the location of targets and base station under a brief geometric model. Then, the exact problem formulation is modeled in order to fully describe the solution space and search for optimal paths that provide maximum coverage rate for SAR-UAVs. Finally, an Adaptive Density Peak Clustering (ADPC) method is proposed to overcome the additional energy consumption due to the viewpoints being far away from the base station. The Particle Swarm Optimization (PSO) algorithm is introduced for optimal path generation. Experimental results demonstrate the effectiveness and computational efficiency of the proposed approach.
Scalable and Efficient Aggregation of Energy-Constrained Flexible Loads
Loads represent a promising flexibility source to support the integration of renewable energy sources, as they may shift their energy consumption over time. By computing the aggregated flexibility of power and energy-constrained loads, aggregators can communicate the group's flexibility without sharing individual private information. However, this computation is, in practice, challenging. Some studies suggest different inner approximations of aggregated flexibility polytopes, but all suffer from large computational costs for realistic load numbers and horizon lengths. In this paper, we develop a novel approximation of the aggregated flexibility of loads based on the concept of worst-case energy dispatch, i.e., if aggregated energy consumptions are assumed to be dispatched in the worst manner possible. This leads to conservative piecewise linear bounds that restrict the aggregated energy consumption only based on the previous aggregated energy consumed. A comparative case study reveals that our method can compute an approximation of the aggregation of thousands of loads efficiently, while displaying an accuracy comparable to other approximation techniques.
comment: 6 pages
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies
When applying offline reinforcement learning (RL) in healthcare scenarios, the out-of-distribution (OOD) issues pose significant risks, as inappropriate generalization beyond clinical expertise can result in potentially harmful recommendations. While existing methods like conservative Q-learning (CQL) attempt to address the OOD issue, their effectiveness is limited by only constraining action selection by suppressing uncertain actions. This action-only regularization imitates clinician actions that prioritize short-term rewards, but it fails to regulate downstream state trajectories, thereby limiting the discovery of improved long-term treatment strategies. To safely improve policy beyond clinician recommendations while ensuring that state-action trajectories remain in-distribution, we propose \textit{Offline Guarded Safe Reinforcement Learning} ($\mathsf{OGSRL}$), a theoretically grounded model-based offline RL framework. $\mathsf{OGSRL}$ introduces a novel dual constraint mechanism for improving policy with reliability and safety. First, the OOD guardian is established to specify clinically validated regions for safe policy exploration. By constraining optimization within these regions, it enables the reliable exploration of treatment strategies that outperform clinician behavior by leveraging the full patient state history, without drifting into unsupported state-action trajectories. Second, we introduce a safety cost constraint that encodes medical knowledge about physiological safety boundaries, providing domain-specific safeguards even in areas where training data might contain potentially unsafe interventions. Notably, we provide theoretical guarantees on safety and near-optimality: policies that satisfy these constraints remain in safe and reliable regions and achieve performance close to the best possible policy supported by the data.
Partitioning and Observability in Linear Systems via Submodular Optimization
Network partitioning has gained recent attention as a pathway to enable decentralized operation and control in large-scale systems. This paper addresses the interplay between partitioning, observability, and sensor placement (SP) in dynamic networks. The problem, being computationally intractable at scale, is largely unexplored in the literature. To that end, the paper's objective is designing scalable partitioning of linear systems while maximizing observability metrics of the subsystems. We show that the partitioning problem can be posed as a submodular maximization problem -- and the SP problem can subsequently be solved over the partitioned network. Consequently, theoretical bounds are derived to compare observability metrics of the original network with those of the resulting partitions, highlighting the impact of partitioning on system observability. Case studies on networks of varying sizes corroborate the derived theoretical bounds.
Unified Multi-Rate Model Predictive Control for a Jet-Powered Humanoid Robot
We propose a novel Model Predictive Control (MPC) framework for a jet-powered flying humanoid robot. The controller is based on a linearised centroidal momentum model to represent the flight dynamics, augmented with a second-order nonlinear model to explicitly account for the slow and nonlinear dynamics of jet propulsion. A key contribution is the introduction of a multi-rate MPC formulation that handles the different actuation rates of the robot's joints and jet engines while embedding the jet dynamics directly into the predictive model. We validated the framework using the jet-powered humanoid robot iRonCub, performing simulations in Mujoco; the simulation results demonstrate the robot's ability to recover from external disturbances and perform stable, non-abrupt flight manoeuvres, validating the effectiveness of the proposed approach.
comment: 8 pages, 6 figures
Control of Renewable Energy Communities using AI and Real-World Data
The electrification of transportation and the increased adoption of decentralized renewable energy generation have added complexity to managing Renewable Energy Communities (RECs). Integrating Electric Vehicle (EV) charging with building energy systems like heating, ventilation, air conditioning (HVAC), photovoltaic (PV) generation, and battery storage presents significant opportunities but also practical challenges. Reinforcement learning (RL), particularly MultiAgent Deep Deterministic Policy Gradient (MADDPG) algorithms, have shown promising results in simulation, outperforming heuristic control strategies. However, translating these successes into real-world deployments faces substantial challenges, including incomplete and noisy data, integration of heterogeneous subsystems, synchronization issues, unpredictable occupant behavior, and missing critical EV state-of-charge (SoC) information. This paper introduces a framework designed explicitly to handle these complexities and bridge the simulation to-reality gap. The framework incorporates EnergAIze, a MADDPG-based multi-agent control strategy, and specifically addresses challenges related to real-world data collection, system integration, and user behavior modeling. Preliminary results collected from a real-world operational REC with four residential buildings demonstrate the practical feasibility of our approach, achieving an average 9% reduction in daily peak demand and a 5% decrease in energy costs through optimized load scheduling and EV charging behaviors. These outcomes underscore the framework's effectiveness, advancing the practical deployment of intelligent energy management solutions in RECs.
comment: 8 pages, 3 figures, 1 table, 30th IEEE International Conference on Emerging Technologies and Factory Automation
UAV Control with Vision-based Hand Gesture Recognition over Edge-Computing
Gesture recognition presents a promising avenue for interfacing with unmanned aerial vehicles (UAVs) due to its intuitive nature and potential for precise interaction. This research conducts a comprehensive comparative analysis of vision-based hand gesture detection methodologies tailored for UAV Control. The existing gesture recognition approaches involving cropping, zooming, and color-based segmentation, do not work well for this kind of applications in dynamic conditions and suffer in performance with increasing distance and environmental noises. We propose to use a novel approach leveraging hand landmarks drawing and classification for gesture recognition based UAV control. With experimental results we show that our proposed method outperforms the other existing methods in terms of accuracy, noise resilience, and efficacy across varying distances, thus providing robust control decisions. However, implementing the deep learning based compute intensive gesture recognition algorithms on the UAV's onboard computer is significantly challenging in terms of performance. Hence, we propose to use a edge-computing based framework to offload the heavier computing tasks, thus achieving closed-loop real-time performance. With implementation over AirSim simulator as well as over a real-world UAV, we showcase the advantage of our end-to-end gesture recognition based UAV control system.
Construction of an Impedance Control Test Bench
Controlling the physical interaction with the environment or objects, as humans do, is a shared requirement across different types of robots. To effectively control this interaction, it is necessary to control the power delivered to the load, that is, the interaction force and the interaction velocity. However, it is not possible to control these two quantities independently at the same time. An alternative is to control the relation between them, with Impedance and Admittance control, for example. The Impedance Control 2 Dimensions (IC2D) bench is a test bench designed to allow the performance analysis of different actuators and controllers at the joint level. Therefore, it was designed to be as versatile as possible, to allow the combination of linear and/or rotational motions, to use electric and/or hydraulic actuators, with loads known and defined by the user. The bench adheres to a set of requirements defined by the demands of the research group, to be a reliable, backlash-free mechatronic system to validate system dynamics models and controller designs, as well as a valuable experimental setup for benchmarking electric and hydraulic actuators. This article presents the mechanical, electrical, and hydraulic configurations used to ensure the robustness and reliability of the test bench. Benches similar to this one are commonly found in robotics laboratories around the world. However, the IC2D stands out for its versatility and reliability, as well as for supporting hydraulic and electric actuators.
comment: 10 pages, 23 figures
ConvoyNext: A Scalable Testbed Platform for Cooperative Autonomous Vehicle Systems
The advancement of cooperative autonomous vehicle systems depends heavily on effective coordination between multiple agents, aiming to enhance traffic efficiency, fuel economy, and road safety. Despite these potential benefits, real-world testing of such systems remains a major challenge and is essential for validating control strategies, trajectory modeling methods, and communication robustness across diverse environments. To address this need, we introduce ConvoyNext, a scalable, modular, and extensible platform tailored for the real-world evaluation of cooperative driving behaviors. We demonstrate the capabilities of ConvoyNext through a series of experiments involving convoys of autonomous vehicles navigating complex trajectories. These tests highlight the platform's robustness across heterogeneous vehicle configurations and its effectiveness in assessing convoy behavior under varying communication conditions, including intentional packet loss. Our results validate ConvoyNext as a comprehensive, open-access testbed for advancing research in cooperative autonomous vehicle systems.
Navigating Polytopes with Safety: A Control Barrier Function Approach
Collision-free motion is a fundamental requirement for many autonomous systems. This paper develops a safety-critical control approach for the collision-free navigation of polytope-shaped agents in polytope-shaped environments. A systematic method is proposed to generate control barrier function candidates in closed form that lead to controllers with formal safety guarantees. The proposed approach is demonstrated through simulation, with obstacle avoidance examples in 2D and 3D, including dynamically changing environments.
comment: Accepted to the 2025 IEEE Conference on Control Technology and Applications (CCTA). 6 pages, 5 figures
IAE Optimized PID Tuning via Second Order Step Response Target Matching
This paper presents SOSTIAE (Second-Order System Target IAE), a novel PID tuning method that combines IAE minimization with explicit transient response shaping for practical control applications. The algorithm generates optimal PID parameters by matching the closed-loop response to a target second-order system with user-defined settling time (T_s) and percent overshoot (PO), while maintaining the conventional IAE performance metric. Comparative evaluations on first to third-order systems demonstrate that SOSTIAE consistently outperforms MATLAB's proprietary pidtune() function, achieving 47-67% lower overshoot and up to 26% better IAE performance for higher-order plants. The constrained optimization framework ensures physically realizable controllers by enforcing non-negative PID gains and stability criteria, addressing known limitations of unconstrained IAE methods. Results indicate that SOSTIAE provides engineers with a systematic alternative for PID tuning when transient specifications and practical implementation constraints are critical.
comment: 10 pages, 3 figures, 1 table
Multi-Objective Optimization Algorithms for Energy Management Systems in Microgrids: A Control Strategy Based on a PHIL System
In this research a real time power hardware in loop configuration has been implemented for an microgrid with the combination of distribution energy resources such as photovoltaic, grid tied inverter, battery, utility grid, and a diesel generator. This paper introduces an unique adaptive multi-objective optimization approach that employs weighted optimization techniques for real-time microgrid systems. The aim is to effectively balance various factors including fuel consumption, load mismatch, power quality, battery degradation, and the utilization of renewable energy sources. A real time experimental data from power hardware in loop system has been used for dynamically updating system states. The adaptive preference-based selection method are adjusted based on state of battery charging thresholds. The technique has been integrated with six technical objectives and complex constraints. This approach helps to practical microgrid decision making and optimization of dynamic energy systems. The energy management process were also able to maximize photovoltaic production where minimizing power mismatch, stabilizing battery state of charge under different condition. The research results were also compared with the baseline system without optimization techniques, and a reliable outcome was found.
Regularized Model Predictive Control
In model predictive control (MPC), the choice of cost-weighting matrices and designing the Hessian matrix directly affects the trade-off between rapid state regulation and minimizing the control effort. However, traditional MPC in quadratic programming relies on fixed design matrices across the entire horizon, which can lead to suboptimal performance. This letter presents a Riccati equation-based method for adjusting the design matrix within the MPC framework, which enhances real-time performance. We employ a penalized least-squares (PLS) approach to derive a quadratic cost function for a discrete-time linear system over a finite prediction horizon. Using the method of weighting and enforcing the constraint equation by introducing a large penalty parameter, we solve the constrained optimization problem and generate control inputs for forward-shifted horizons. This process yields a recursive PLS-based Riccati equation that updates the design matrix as a regularization term in each shift, forming the foundation of the regularized MPC (Re-MPC) algorithm. To accomplish this, we provide a convergence and stability analysis of the developed algorithm. Numerical analysis demonstrates its superiority over traditional methods by allowing Riccati equation-based adjustments.
A Hybrid Systems Model of Feedback Optimization for Linear Systems
Feedback optimization algorithms compute inputs to a system in real time, which helps mitigate the effects of unknown disturbances. However, existing work models both system dynamics and computations in either discrete or continuous time, which does not faithfully model some applications. In this work, we model linear system dynamics in continuous time, and we model the computations of inputs in discrete time. Therefore, we present a novel hybrid systems framework for modeling feedback optimization of linear time-invariant systems that are subject to unknown, constant disturbances. For this setup, we first establish the well-posedness of the hybrid model and establish completeness of solutions while ruling out Zeno behavior. Then, our main result derives a convergence rate and an error bound for the full hybrid computation-in-theloop system and shows that it converges exponentially towards a ball of known radius about a desired fixed point. Simulation results show that this approach successfully mitigates the effects of disturbances, with the magnitude of steady-state error being 81% less than the magnitude of the disturbances in the system.
comment: 14 Pages, 3 Figures, submitted to Conference on Decision and Control 2025
Fast computation of the TGOSPA metric for multiple target tracking via unbalanced optimal transport
In multiple target tracking, it is important to be able to evaluate the performance of different tracking algorithms. The trajectory generalized optimal sub-pattern assignment metric (TGOSPA) is a recently proposed metric for such evaluations. The TGOSPA metric is computed as the solution to an optimization problem, but for large tracking scenarios, solving this problem becomes computationally demanding. In this paper, we present an approximation algorithm for evaluating the TGOSPA metric, based on casting the TGOSPA problem as an unbalanced multimarginal optimal transport problem. Following recent advances in computational optimal transport, we introduce an entropy regularization and derive an iterative scheme for solving the Lagrangian dual of the regularized problem. Numerical results suggest that our proposed algorithm is more computationally efficient than the alternative of computing the exact metric using a linear programming solver, while still providing an adequate approximation of the metric.
comment: 6 pages, 3 figures. Revision
Choose Wisely: Data-driven Predictive Control for Nonlinear Systems Using Online Data Selection
This paper proposes Select-Data-driven Predictive Control (Select-DPC), a new method for controlling nonlinear systems using output-feedback for which data are available but an explicit model is not. At each timestep, Select-DPC employs only the most relevant data to implicitly linearize the dynamics in "trajectory space". Then, taking user-defined output constraints into account, it makes control decisions using a convex optimization. This optimal control is applied in a receding-horizon manner. As the online data-selection is the core of Select-DPC, we propose and verify both norm-based and manifold-embedding-based selection methods. We evaluate Select-DPC on three benchmark nonlinear system simulators -- rocket-landing, a robotic arm and cart-pole inverted pendulum swing-up -- comparing them with standard Data-enabled Predictive Control (DeePC) and Time-Windowed DeePC methods, and find that Select-DPC outperforms both methods.
Distribution System Reconfiguration to Mitigate Load Altering Attacks via Stackelberg Games
The integration of IoT-controllable devices in power systems (such as smart electric vehicle charging stations, heat pumps, etc.), despite their benefits, raises novel cybersecurity concerns. Vulnerabilities in these devices can be leveraged to launch load-altering attacks (LAAs) that can potentially compromise the safety of power systems. In this paper, we analyze the impact of LAAs on the voltage profile of distribution networks (DNs). We first derive closed-form expressions to quantify the attacks' impact. Using the insights derived from this analysis, we then propose a reactive defense method to mitigate LAAs based on reconfiguring the DNs. We also study optimal defense strategies that are robust to LAAs by exploiting non-cooperative sequential game theory. The proposed solution takes into account the potential uncertainties in the attack localization. Furthermore, we propose a Bayesian optimization (BO) approach to compute the equilibrium of the game, which reduces the computational burden. Our results show that attacks launched on the deepest nodes in the DN have the most detrimental effect on the grid voltage profile. Furthermore, the proposed game-theoretic strategy successfully mitigates the effect of the attack while ensuring minimum system reconfiguration.
Distributed alternating gradient descent for convex semi-infinite programs over a network
This paper presents a first-order distributed algorithm for solving a convex semi-infinite program (SIP) over a time-varying network. In this setting, the objective function associated with the optimization problem is a summation of a set of functions, each held by one node in a network. The semi-infinite constraint, on the other hand, is known to all agents. The nodes collectively aim to solve the problem using local data about the objective and limited communication capabilities depending on the network topology. Our algorithm is built on three key ingredients: consensus step, gradient descent in the local objective, and local gradient descent iterations in the constraint at a node when the estimate violates the semi-infinite constraint. The algorithm is constructed, and its parameters are prescribed in such a way that the iterates held by each agent provably converge to an optimizer. That is, as the algorithm progresses, the estimates achieve consensus, and the constraint violation and the error in the optimal value are bounded above by vanishing terms. Simulation examples illustrate our results.
comment: 16 pages, 1 figure
Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots
Mobile smartphones compactly provide sensors such as cameras, IMUs, GNSS measurement units, and wireless and wired communication channels required for robotics projects. They are affordable, portable, and programmable, which makes them ideal for testing, data acquisition, controlling mobile robots, and many other robotic applications. A robotic system is proposed in this paper, consisting of an Android phone, a microcontroller board attached to the phone via USB, and a remote wireless controller station. In the data acquisition mode, the Android device can record a dataset of a diverse configuration of multiple cameras, IMUs, GNSS units, and external USB ADC channels in the rawest format used for, but not limited to, pose estimation and scene reconstruction applications. In robot control mode, the Android phone, a microcontroller board, and other peripherals constitute the mobile or stationary robotic system. This system is controlled using a remote server connected over Wi-Fi or Bluetooth. Experiments show that although the SLAM and AR applications can utilize the acquired data, the proposed system can pave the way for more advanced algorithms for processing these noisy and sporadic measurements. Moreover, the characteristics of the communication media are studied, and two example robotic projects, which involve controlling a toy car and a quadcopter, are included.
comment: Project repository: https://github.com/m-dayani/robo-platform Youtube Video: https://youtu.be/BTQ4yLB1bak Dataset: https://drive.google.com/drive/folders/1OZqdA1xa-SyJ64qL_TibqhtwhR1fWWrx?usp=sharing
Physics-Informed Neural Network-Based Control for Grid-Forming Converter's Stability Under Overload Conditions
Grid-forming converters (GFCs) are crucial for frequency and voltage stability in modern power systems. However, their performance under overload conditions remains a challenge. This paper highlights the limitations of existing approaches in managing DC source saturation and AC current limits, emphasizing the need for improved control strategies to ensure system stability. This paper proposes a control strategy based on a physics-informed neural network (PINN) to improve GFC performance under overloaded conditions, effectively preventing switch failures and mitigating DC source saturation. This approach outperforms conventional methods by maintaining stable voltage and frequency, even under significant load increase where traditional droop control alone proves inadequate. The post-disturbance operating point of GFCs remains unchanged using PINN-based control with an improvement of 0.245 Hz in frequency and 0.03 p.u. in active power when compared to an already existing current limitation strategy. Additionally, it reduces peak voltage deviations during transients by 24.14\%, lowers the rate of change of frequency (ROCOF) from 0.02 Hz/s to 0.005 Hz/s, and improves the rate of change of voltage (ROCOV), keeping both within acceptable limits. These improvements significantly enhance system resilience, especially in inertia-less power networks.
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that the clutch can deliver a peak holding force of 380 N with a 1.48 W consumption, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
Physics-informed machine learning for building performance simulation-A review of a nascent field
Building performance simulation (BPS) is critical for understanding building dynamics and behavior, analyzing performance of the built environment, optimizing energy efficiency, improving demand flexibility, and enhancing building resilience. However, conducting BPS is not trivial. Traditional BPS relies on an accurate building energy model, mostly physics-based, which depends heavily on detailed building information, expert knowledge, and case-by-case model calibrations, thereby significantly limiting their scalability. With the development of sensing technology and increased data availability, there is a growing attention and interest in data-driven BPS. However, purely data-driven models often suffer from limited generalization ability and a lack of physical consistency, resulting in poor performance in real-world applications. To address these limitations, recent studies have started to incorporate physics priors into data-driven models, a methodology called physics-informed machine learning (PIML). PIML is an emerging field with the definitions, methodologies, evaluation criteria, application scenarios, and future directions that remain open. To bridge those gaps, this study systematically reviews the state-of-art PIML for BPS, offering a comprehensive definition of PIML, and comparing it to traditional BPS approaches regarding data requirements, modeling effort, performance and computation cost. We also summarize the commonly used methodologies, validation approaches, application domains, available data sources, open-source packages and testbeds. In addition, this study provides a general guideline for selecting appropriate PIML models based on BPS applications. Finally, this study identifies key challenges and outlines future research directions, providing a solid foundation and valuable insights to advance R&D of PIML in BPS.
Model Identification Adaptive Control with $ρ$-POMDP Planning
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($\rho$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.
comment: Accepted to CoDIT 2025
SINDyG: Sparse Identification of Nonlinear Dynamical Systems from Graph-Structured Data
The combination of machine learning (ML) and sparsity-promoting techniques is enabling direct extraction of governing equations from data, revolutionizing computational modeling in diverse fields of science and engineering. The discovered dynamical models could be used to address challenges in climate science, neuroscience, ecology, finance, epidemiology, and beyond. However, most existing sparse identification methods for discovering dynamical systems treat the whole system as one without considering the interactions between subsystems. As a result, such models are not able to capture small changes in the emergent system behavior. To address this issue, we developed a new method called Sparse Identification of Nonlinear Dynamical Systems from Graph-structured data (SINDyG), which incorporates the network structure into sparse regression to identify model parameters that explain the underlying network dynamics. We showcase the application of our proposed method using several case studies of neuronal dynamics, where we model the macroscopic oscillation of a population of neurons using the extended Stuart-Landau (SL) equation and utilize the SINDyG method to identify the underlying nonlinear dynamics. Our extensive computational experiments validate the improved accuracy and simplicity of discovered network dynamics when compared to the original SINDy approach.
Safe PDE Boundary Control with Neural Operators
The physical world dynamics are generally governed by underlying partial differential equations (PDEs) with unknown analytical forms in science and engineering problems. Neural network based data-driven approaches have been heavily studied in simulating and solving PDE problems in recent years, but it is still challenging to move forward from understanding to controlling the unknown PDE dynamics. PDE boundary control instantiates a simplified but important problem by only focusing on PDE boundary conditions as the control input and output. However, current model-free PDE controllers cannot ensure the boundary output satisfies some given user-specified safety constraint. To this end, we propose a safety filtering framework to guarantee the boundary output stays within the safe set for current model-free controllers. Specifically, we first introduce a neural boundary control barrier function (BCBF) to ensure the feasibility of the trajectory-wise constraint satisfaction of boundary output. Based on the neural operator modeling the transfer function from boundary control input to output trajectories, we show that the change in the BCBF depends linearly on the change in input boundary, so quadratic programming-based safety filtering can be done for pre-trained model-free controllers. Extensive experiments under challenging hyperbolic, parabolic and Navier-Stokes PDE dynamics environments validate the plug-and-play effectiveness of the proposed method by achieving better general performance and boundary constraint satisfaction compared to the vanilla and constrained model-free controller baselines. The code is available at https://github.com/intelligent-control-lab/safe-pde-control.
comment: L4DC 2025 full version, 28 pages, 5 figures, 8 tables
Systems and Control (EESS)
Delayed dynamic-feedback controller design for multi-frequency vibration suppression
We present a methodology for designing a dynamic controller with delayed output feedback for achieving non-collocated vibration suppression with a focus on the multi-frequency case. To synthesize the delay-based controller, we first remodel the system of equations as a delay-differential algebraic equation (DDAE) in such a way that existing tools for design of a static output feedback controller can be easily adapted. The problem of achieving non-collocated vibration suppression with sufficient damping is formulated as a constrained optimization problem of minimizing the spectral abscissa in the presence of zero-location constraints, with the constraints exhibiting polynomial dependence on its parameters. We transform the problem into an unconstrained one using elimination, following which we solve the resulting non-convex, non-smooth optimization problem.
comment: 7 pages, 4 figures, submitted April 24, 2024, to Time Delay Systems (TDS 2024)
Modeling and Constraint-Aware Control of Pressure Dynamics in Water Electrolysis Systems
This paper addresses the challenge of pressure constraint violations in water electrolysis systems operating under dynamic power conditions, a problem common to both Proton Exchange Membrane and alkaline technologies. To investigate this issue, a control-oriented model of an alkaline electrolyzer is developed, capturing key pressure and flow dynamics. To manage rapid power fluctuations that may cause pressure to exceed manufacturer-defined operational boundaries, a model-based constraint-aware power governor based on the Reference Governor (RG) framework is proposed. Simulation results show that the strategy effectively maintains pressure within the specified operating range, outperforming conventional filtering methods while enhancing hydrogen production and reducing auxiliary energy consumption.
On the Deployment of RIS-mounted UAV Networks
Reconfigurable intelligent surfaces (RIS) enable smart wireless environments by dynamically controlling signal propagation to enhance communication and localization. Unmanned aerial vehicles (UAVs) can act as flying base stations and thus, improve system performance by avoiding signal blockages. In this paper, we propose a gradient ascent and coordinate search based method to determine the optimal location for a system that consists of a UAV and a RIS, where the UAV serves cellular users (CUs) and the RIS serves device-to-device (D2D) pairs. In particular, by optimizing the net throughput for both the D2D pairs and the CUs, the suggested method establishes the ideal location for the RIS-mounted UAV. We consider both line of sight (LoS) and non-LoS paths for the RIS and UAV to calculate the throughput while accounting for blockages in the system. The numerical results show that the proposed method performs better than the existing approaches in terms of both the net throughput and the user fairness.
comment: Submitted for a possible publication
MEbots: Integrating a RISC-V Virtual Platform with a Robotic Simulator for Energy-aware Design
Virtual Platforms (VPs) enable early software validation of autonomous systems' electronics, reducing costs and time-to-market. While many VPs support both functional and non-functional simulation (e.g., timing, power), they lack the capability of simulating the environment in which the system operates. In contrast, robotics simulators lack accurate timing and power features. This twofold shortcoming limits the effectiveness of the design flow, as the designer can not fully evaluate the features of the solution under development. This paper presents a novel, fully open-source framework bridging this gap by integrating a robotics simulator (Webots) with a VP for RISC-V-based systems (MESSY). The framework enables a holistic, mission-level, energy-aware co-simulation of electronics in their surrounding environment, streamlining the exploration of design configurations and advanced power management policies.
Data Center Model for Transient Stability Analysis of Power Systems
The rising demand of computing power leads to the installation of a large number of Data Centers (DCs). Their Fault-Ride-Through (FRT) behavior and their unique power characteristics, especially for DCs catered to Artificial Intelligence (AI) workloads, pose a threat to the stability of power systems. To ensure its stability, it is required accurate models of the loads involved. Here we propose a dynamic load model that properly captures the behaviour of DCs. Its three most defining features are the use of an Uninterrupted Power Supply (UPS) which sits between the server load and the grid, the cooling load represented by an induction motor, and a pulsing load that represents the transients caused by contemporary DCs with significant AI workloads. The features of the proposed model and its impact on the dynamic performance of transmission systems are illustrated through a model of the all-island Irish transmission system and real-world data of the DCs currently connected to this system.
Neural network based control of unknown nonlinear systems via contraction analysis
This paper studies the design of neural network (NN)-based controllers for unknown nonlinear systems, using contraction analysis. A Neural Ordinary Differential Equation (NODE) system is constructed by approximating the unknown draft dynamics with a feedforward NN. Incremental sector bounds and contraction theory are applied to the activation functions and the weights of the NN, respectively. It is demonstrated that if the incremental sector bounds and the weights satisfy some non-convex conditions, the NODE system is contractive. To improve computational efficiency, these non-convex conditions are reformulated as convex LMI conditions. Additionally, it is proven that when the NODE system is contractive, the trajectories of the original autonomous system converge to a neighborhood of the unknown equilibrium, with the size of this neighborhood determined by the approximation error. For a single-layer NN, the NODE system is simplified to a continuous-time Hopfield NN. If the NODE system does not satisfy the contraction conditions, an NN-based controller is designed to enforce contractivity. This controller integrates a linear component, which ensures contraction through suitable control gains, and an NN component, which compensates for the NODE system's nonlinearities. This integrated controller guarantees that the trajectories of the original affine system converge to a neighborhood of the unknown equilibrium. The effectiveness of the proposed approach is demonstrated through two illustrative examples.
SpineWave: Harnessing Fish Rigid-Flexible Spinal Kinematics for Enhancing Biomimetic Robotic Locomotion
Fish have endured millions of years of evolution, and their distinct rigid-flexible body structures offer inspiration for overcoming challenges in underwater robotics, such as limited mobility, high energy consumption, and adaptability. This paper introduces SpineWave, a biomimetic robotic fish featuring a fish-spine-like rigid-flexible transition structure. The structure integrates expandable fishbone-like ribs and adjustable magnets, mimicking the stretch and recoil of fish muscles to balance rigidity and flexibility. In addition, we employed an evolutionary algorithm to optimize the hydrodynamics of the robot, achieving significant improvements in swimming performance. Real-world tests demonstrated robustness and potential for environmental monitoring, underwater exploration, and industrial inspection. These tests established SpineWave as a transformative platform for aquatic robotics.
Robust Look-ahead Pursuit Control for Three-Dimensional Path Following within Finite-Time Stability Guarantee
This paper addresses the challenging problem of robust path following for fixed-wing unmanned aerial vehicles (UAVs) in complex environments with bounded external disturbances and non-smooth predefined paths. Due to the unique aerodynamic characteristics and flight constraints of fixed-wing UAVs, achieving accurate and stable path following remains difficult, especially in low-altitude mountainous terrains, urban landscapes, and under wind disturbances. Traditional path-following guidance laws often struggle with rapid stabilization and constrained input commands under unknown disturbances while maintaining robustness. To overcome these limitations, we propose a robust nonlinear path-following guidance law that considers the flight path angle and track angle, and dynamically adjusts controller parameters to achieve optimal compensation for acceleration increments. The proposed guidance law guarantees finite-time stability, reduced sensitivity to constrained uncertainties, and consistent behavior compared to traditional asymptotic convergence controllers. Additionally, it ensures that the UAV approaches mobile virtual target points in the shortest possible time while adhering to input constrained conditions. Our contributions include a thorough analysis of the conditions for robust stability, the derivation of the guidance law, and simulations demonstrating its effectiveness. The results show that the proposed guidance law significantly improves path-following performance under external disturbances, making it a promising solution for autonomous missions execution of fixed-wing UAVs.
comment: 16 pages, 14 figures
Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach
Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA) scheme, aiming at precisely introducing the extent of poisoning in a subtle controlled manner. It operates with a predefined objective, such as reducing global model's prediction accuracy by 10\%. FedSA integrates robust nonlinear control-Sliding Mode Control (SMC) theory with model poisoning attacks. It can manipulate the updates from malicious clients to drive the global model towards a compromised state, achieving this at a controlled and inconspicuous rate. Additionally, leveraging the robust control properties of FedSA allows precise control over the convergence bounds, enabling the attacker to set the global accuracy of the poisoned model to any desired level. Experimental results demonstrate that FedSA can accurately achieve a predefined global accuracy with fewer malicious clients while maintaining a high level of stealth and adjustable learning rates.
Trajectory-Independent Flexibility Envelopes of Energy-Constrained Systems with State-Dependent Losses
As non-dispatchable renewable power units become prominent in electric power grids, demand-side flexibility appears as a key element of future power systems' operation. Power and energy bounds are intuitive metrics to describe the flexibility of energy-constrained loads. However, to be used in operation, any power consumption trajectory fulfilling the power and energy bounds must necessarily fulfill the load's constraints. In this paper, we demonstrate that energy bounds defined as the minimum and maximum energy consumption potential of a load with state-dependent losses are Trajectory-Dependent (TD), i.e., for any energy value in the bounds a feasible power trajectory exists, but not all power trajectories enclosed in the energy envelopes satisfy the load's constraints. To guarantee the satisfaction of load constraints for all trajectories, we define Trajectory-Independent (TI) energy bounds. We present TI envelope formulations for individual loads, as well as physically coupled loads and assess the proposed formulations in a building heating system, a system with state-dependent losses. We find that using a TD envelope as energy bounds in operation may yield room temperature up to 3.8{\deg}C higher and 3.4{\deg}C lower than admissible. Overall, poorly insulated buildings observe a TI energy envelope that differs significantly from their TD envelope.
comment: 10 pages
Coverage Path Planning For Multi-view SAR-UAV Observation System Under Energy Constraint
Multi-view Synthetic Aperture Radar (SAR) imaging can effectively enhance the performance of tasks such as automatic target recognition and image information fusion. Unmanned aerial vehicles (UAVs) have the advantages of flexible deployment and cost reduction. A swarm of UAVs equipped with synthetic aperture radar imaging equipment is well suited to meet the functional requirements of multi-view synthetic aperture radar imaging missions. However, to provide optimal paths for SAR-UAVs from the base station to cover target viewpoints in the mission area is of NP-hard computational complexity. In this work, the coverage path planning problem for multi-view SAR-UAV observation systems is studied. First, the coordinate of observation viewpoints is calculated based on the location of targets and base station under a brief geometric model. Then, the exact problem formulation is modeled in order to fully describe the solution space and search for optimal paths that provide maximum coverage rate for SAR-UAVs. Finally, an Adaptive Density Peak Clustering (ADPC) method is proposed to overcome the additional energy consumption due to the viewpoints being far away from the base station. The Particle Swarm Optimization (PSO) algorithm is introduced for optimal path generation. Experimental results demonstrate the effectiveness and computational efficiency of the proposed approach.
Scalable and Efficient Aggregation of Energy-Constrained Flexible Loads
Loads represent a promising flexibility source to support the integration of renewable energy sources, as they may shift their energy consumption over time. By computing the aggregated flexibility of power and energy-constrained loads, aggregators can communicate the group's flexibility without sharing individual private information. However, this computation is, in practice, challenging. Some studies suggest different inner approximations of aggregated flexibility polytopes, but all suffer from large computational costs for realistic load numbers and horizon lengths. In this paper, we develop a novel approximation of the aggregated flexibility of loads based on the concept of worst-case energy dispatch, i.e., if aggregated energy consumptions are assumed to be dispatched in the worst manner possible. This leads to conservative piecewise linear bounds that restrict the aggregated energy consumption only based on the previous aggregated energy consumed. A comparative case study reveals that our method can compute an approximation of the aggregation of thousands of loads efficiently, while displaying an accuracy comparable to other approximation techniques.
comment: 6 pages
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies
When applying offline reinforcement learning (RL) in healthcare scenarios, the out-of-distribution (OOD) issues pose significant risks, as inappropriate generalization beyond clinical expertise can result in potentially harmful recommendations. While existing methods like conservative Q-learning (CQL) attempt to address the OOD issue, their effectiveness is limited by only constraining action selection by suppressing uncertain actions. This action-only regularization imitates clinician actions that prioritize short-term rewards, but it fails to regulate downstream state trajectories, thereby limiting the discovery of improved long-term treatment strategies. To safely improve policy beyond clinician recommendations while ensuring that state-action trajectories remain in-distribution, we propose \textit{Offline Guarded Safe Reinforcement Learning} ($\mathsf{OGSRL}$), a theoretically grounded model-based offline RL framework. $\mathsf{OGSRL}$ introduces a novel dual constraint mechanism for improving policy with reliability and safety. First, the OOD guardian is established to specify clinically validated regions for safe policy exploration. By constraining optimization within these regions, it enables the reliable exploration of treatment strategies that outperform clinician behavior by leveraging the full patient state history, without drifting into unsupported state-action trajectories. Second, we introduce a safety cost constraint that encodes medical knowledge about physiological safety boundaries, providing domain-specific safeguards even in areas where training data might contain potentially unsafe interventions. Notably, we provide theoretical guarantees on safety and near-optimality: policies that satisfy these constraints remain in safe and reliable regions and achieve performance close to the best possible policy supported by the data.
Partitioning and Observability in Linear Systems via Submodular Optimization
Network partitioning has gained recent attention as a pathway to enable decentralized operation and control in large-scale systems. This paper addresses the interplay between partitioning, observability, and sensor placement (SP) in dynamic networks. The problem, being computationally intractable at scale, is largely unexplored in the literature. To that end, the paper's objective is designing scalable partitioning of linear systems while maximizing observability metrics of the subsystems. We show that the partitioning problem can be posed as a submodular maximization problem -- and the SP problem can subsequently be solved over the partitioned network. Consequently, theoretical bounds are derived to compare observability metrics of the original network with those of the resulting partitions, highlighting the impact of partitioning on system observability. Case studies on networks of varying sizes corroborate the derived theoretical bounds.
Unified Multi-Rate Model Predictive Control for a Jet-Powered Humanoid Robot
We propose a novel Model Predictive Control (MPC) framework for a jet-powered flying humanoid robot. The controller is based on a linearised centroidal momentum model to represent the flight dynamics, augmented with a second-order nonlinear model to explicitly account for the slow and nonlinear dynamics of jet propulsion. A key contribution is the introduction of a multi-rate MPC formulation that handles the different actuation rates of the robot's joints and jet engines while embedding the jet dynamics directly into the predictive model. We validated the framework using the jet-powered humanoid robot iRonCub, performing simulations in Mujoco; the simulation results demonstrate the robot's ability to recover from external disturbances and perform stable, non-abrupt flight manoeuvres, validating the effectiveness of the proposed approach.
comment: 8 pages, 6 figures
Control of Renewable Energy Communities using AI and Real-World Data
The electrification of transportation and the increased adoption of decentralized renewable energy generation have added complexity to managing Renewable Energy Communities (RECs). Integrating Electric Vehicle (EV) charging with building energy systems like heating, ventilation, air conditioning (HVAC), photovoltaic (PV) generation, and battery storage presents significant opportunities but also practical challenges. Reinforcement learning (RL), particularly MultiAgent Deep Deterministic Policy Gradient (MADDPG) algorithms, have shown promising results in simulation, outperforming heuristic control strategies. However, translating these successes into real-world deployments faces substantial challenges, including incomplete and noisy data, integration of heterogeneous subsystems, synchronization issues, unpredictable occupant behavior, and missing critical EV state-of-charge (SoC) information. This paper introduces a framework designed explicitly to handle these complexities and bridge the simulation to-reality gap. The framework incorporates EnergAIze, a MADDPG-based multi-agent control strategy, and specifically addresses challenges related to real-world data collection, system integration, and user behavior modeling. Preliminary results collected from a real-world operational REC with four residential buildings demonstrate the practical feasibility of our approach, achieving an average 9% reduction in daily peak demand and a 5% decrease in energy costs through optimized load scheduling and EV charging behaviors. These outcomes underscore the framework's effectiveness, advancing the practical deployment of intelligent energy management solutions in RECs.
comment: 8 pages, 3 figures, 1 table, 30th IEEE International Conference on Emerging Technologies and Factory Automation
UAV Control with Vision-based Hand Gesture Recognition over Edge-Computing
Gesture recognition presents a promising avenue for interfacing with unmanned aerial vehicles (UAVs) due to its intuitive nature and potential for precise interaction. This research conducts a comprehensive comparative analysis of vision-based hand gesture detection methodologies tailored for UAV Control. The existing gesture recognition approaches involving cropping, zooming, and color-based segmentation, do not work well for this kind of applications in dynamic conditions and suffer in performance with increasing distance and environmental noises. We propose to use a novel approach leveraging hand landmarks drawing and classification for gesture recognition based UAV control. With experimental results we show that our proposed method outperforms the other existing methods in terms of accuracy, noise resilience, and efficacy across varying distances, thus providing robust control decisions. However, implementing the deep learning based compute intensive gesture recognition algorithms on the UAV's onboard computer is significantly challenging in terms of performance. Hence, we propose to use a edge-computing based framework to offload the heavier computing tasks, thus achieving closed-loop real-time performance. With implementation over AirSim simulator as well as over a real-world UAV, we showcase the advantage of our end-to-end gesture recognition based UAV control system.
Construction of an Impedance Control Test Bench
Controlling the physical interaction with the environment or objects, as humans do, is a shared requirement across different types of robots. To effectively control this interaction, it is necessary to control the power delivered to the load, that is, the interaction force and the interaction velocity. However, it is not possible to control these two quantities independently at the same time. An alternative is to control the relation between them, with Impedance and Admittance control, for example. The Impedance Control 2 Dimensions (IC2D) bench is a test bench designed to allow the performance analysis of different actuators and controllers at the joint level. Therefore, it was designed to be as versatile as possible, to allow the combination of linear and/or rotational motions, to use electric and/or hydraulic actuators, with loads known and defined by the user. The bench adheres to a set of requirements defined by the demands of the research group, to be a reliable, backlash-free mechatronic system to validate system dynamics models and controller designs, as well as a valuable experimental setup for benchmarking electric and hydraulic actuators. This article presents the mechanical, electrical, and hydraulic configurations used to ensure the robustness and reliability of the test bench. Benches similar to this one are commonly found in robotics laboratories around the world. However, the IC2D stands out for its versatility and reliability, as well as for supporting hydraulic and electric actuators.
comment: 10 pages, 23 figures
ConvoyNext: A Scalable Testbed Platform for Cooperative Autonomous Vehicle Systems
The advancement of cooperative autonomous vehicle systems depends heavily on effective coordination between multiple agents, aiming to enhance traffic efficiency, fuel economy, and road safety. Despite these potential benefits, real-world testing of such systems remains a major challenge and is essential for validating control strategies, trajectory modeling methods, and communication robustness across diverse environments. To address this need, we introduce ConvoyNext, a scalable, modular, and extensible platform tailored for the real-world evaluation of cooperative driving behaviors. We demonstrate the capabilities of ConvoyNext through a series of experiments involving convoys of autonomous vehicles navigating complex trajectories. These tests highlight the platform's robustness across heterogeneous vehicle configurations and its effectiveness in assessing convoy behavior under varying communication conditions, including intentional packet loss. Our results validate ConvoyNext as a comprehensive, open-access testbed for advancing research in cooperative autonomous vehicle systems.
Navigating Polytopes with Safety: A Control Barrier Function Approach
Collision-free motion is a fundamental requirement for many autonomous systems. This paper develops a safety-critical control approach for the collision-free navigation of polytope-shaped agents in polytope-shaped environments. A systematic method is proposed to generate control barrier function candidates in closed form that lead to controllers with formal safety guarantees. The proposed approach is demonstrated through simulation, with obstacle avoidance examples in 2D and 3D, including dynamically changing environments.
comment: Accepted to the 2025 IEEE Conference on Control Technology and Applications (CCTA). 6 pages, 5 figures
IAE Optimized PID Tuning via Second Order Step Response Target Matching
This paper presents SOSTIAE (Second-Order System Target IAE), a novel PID tuning method that combines IAE minimization with explicit transient response shaping for practical control applications. The algorithm generates optimal PID parameters by matching the closed-loop response to a target second-order system with user-defined settling time (T_s) and percent overshoot (PO), while maintaining the conventional IAE performance metric. Comparative evaluations on first to third-order systems demonstrate that SOSTIAE consistently outperforms MATLAB's proprietary pidtune() function, achieving 47-67% lower overshoot and up to 26% better IAE performance for higher-order plants. The constrained optimization framework ensures physically realizable controllers by enforcing non-negative PID gains and stability criteria, addressing known limitations of unconstrained IAE methods. Results indicate that SOSTIAE provides engineers with a systematic alternative for PID tuning when transient specifications and practical implementation constraints are critical.
comment: 10 pages, 3 figures, 1 table
Multi-Objective Optimization Algorithms for Energy Management Systems in Microgrids: A Control Strategy Based on a PHIL System
In this research a real time power hardware in loop configuration has been implemented for an microgrid with the combination of distribution energy resources such as photovoltaic, grid tied inverter, battery, utility grid, and a diesel generator. This paper introduces an unique adaptive multi-objective optimization approach that employs weighted optimization techniques for real-time microgrid systems. The aim is to effectively balance various factors including fuel consumption, load mismatch, power quality, battery degradation, and the utilization of renewable energy sources. A real time experimental data from power hardware in loop system has been used for dynamically updating system states. The adaptive preference-based selection method are adjusted based on state of battery charging thresholds. The technique has been integrated with six technical objectives and complex constraints. This approach helps to practical microgrid decision making and optimization of dynamic energy systems. The energy management process were also able to maximize photovoltaic production where minimizing power mismatch, stabilizing battery state of charge under different condition. The research results were also compared with the baseline system without optimization techniques, and a reliable outcome was found.
Regularized Model Predictive Control
In model predictive control (MPC), the choice of cost-weighting matrices and designing the Hessian matrix directly affects the trade-off between rapid state regulation and minimizing the control effort. However, traditional MPC in quadratic programming relies on fixed design matrices across the entire horizon, which can lead to suboptimal performance. This letter presents a Riccati equation-based method for adjusting the design matrix within the MPC framework, which enhances real-time performance. We employ a penalized least-squares (PLS) approach to derive a quadratic cost function for a discrete-time linear system over a finite prediction horizon. Using the method of weighting and enforcing the constraint equation by introducing a large penalty parameter, we solve the constrained optimization problem and generate control inputs for forward-shifted horizons. This process yields a recursive PLS-based Riccati equation that updates the design matrix as a regularization term in each shift, forming the foundation of the regularized MPC (Re-MPC) algorithm. To accomplish this, we provide a convergence and stability analysis of the developed algorithm. Numerical analysis demonstrates its superiority over traditional methods by allowing Riccati equation-based adjustments.
A Hybrid Systems Model of Feedback Optimization for Linear Systems
Feedback optimization algorithms compute inputs to a system in real time, which helps mitigate the effects of unknown disturbances. However, existing work models both system dynamics and computations in either discrete or continuous time, which does not faithfully model some applications. In this work, we model linear system dynamics in continuous time, and we model the computations of inputs in discrete time. Therefore, we present a novel hybrid systems framework for modeling feedback optimization of linear time-invariant systems that are subject to unknown, constant disturbances. For this setup, we first establish the well-posedness of the hybrid model and establish completeness of solutions while ruling out Zeno behavior. Then, our main result derives a convergence rate and an error bound for the full hybrid computation-in-theloop system and shows that it converges exponentially towards a ball of known radius about a desired fixed point. Simulation results show that this approach successfully mitigates the effects of disturbances, with the magnitude of steady-state error being 81% less than the magnitude of the disturbances in the system.
comment: 14 Pages, 3 Figures, submitted to Conference on Decision and Control 2025
Fast computation of the TGOSPA metric for multiple target tracking via unbalanced optimal transport
In multiple target tracking, it is important to be able to evaluate the performance of different tracking algorithms. The trajectory generalized optimal sub-pattern assignment metric (TGOSPA) is a recently proposed metric for such evaluations. The TGOSPA metric is computed as the solution to an optimization problem, but for large tracking scenarios, solving this problem becomes computationally demanding. In this paper, we present an approximation algorithm for evaluating the TGOSPA metric, based on casting the TGOSPA problem as an unbalanced multimarginal optimal transport problem. Following recent advances in computational optimal transport, we introduce an entropy regularization and derive an iterative scheme for solving the Lagrangian dual of the regularized problem. Numerical results suggest that our proposed algorithm is more computationally efficient than the alternative of computing the exact metric using a linear programming solver, while still providing an adequate approximation of the metric.
comment: 6 pages, 3 figures. Revision
Choose Wisely: Data-driven Predictive Control for Nonlinear Systems Using Online Data Selection
This paper proposes Select-Data-driven Predictive Control (Select-DPC), a new method for controlling nonlinear systems using output-feedback for which data are available but an explicit model is not. At each timestep, Select-DPC employs only the most relevant data to implicitly linearize the dynamics in "trajectory space". Then, taking user-defined output constraints into account, it makes control decisions using a convex optimization. This optimal control is applied in a receding-horizon manner. As the online data-selection is the core of Select-DPC, we propose and verify both norm-based and manifold-embedding-based selection methods. We evaluate Select-DPC on three benchmark nonlinear system simulators -- rocket-landing, a robotic arm and cart-pole inverted pendulum swing-up -- comparing them with standard Data-enabled Predictive Control (DeePC) and Time-Windowed DeePC methods, and find that Select-DPC outperforms both methods.
Distribution System Reconfiguration to Mitigate Load Altering Attacks via Stackelberg Games
The integration of IoT-controllable devices in power systems (such as smart electric vehicle charging stations, heat pumps, etc.), despite their benefits, raises novel cybersecurity concerns. Vulnerabilities in these devices can be leveraged to launch load-altering attacks (LAAs) that can potentially compromise the safety of power systems. In this paper, we analyze the impact of LAAs on the voltage profile of distribution networks (DNs). We first derive closed-form expressions to quantify the attacks' impact. Using the insights derived from this analysis, we then propose a reactive defense method to mitigate LAAs based on reconfiguring the DNs. We also study optimal defense strategies that are robust to LAAs by exploiting non-cooperative sequential game theory. The proposed solution takes into account the potential uncertainties in the attack localization. Furthermore, we propose a Bayesian optimization (BO) approach to compute the equilibrium of the game, which reduces the computational burden. Our results show that attacks launched on the deepest nodes in the DN have the most detrimental effect on the grid voltage profile. Furthermore, the proposed game-theoretic strategy successfully mitigates the effect of the attack while ensuring minimum system reconfiguration.
Distributed alternating gradient descent for convex semi-infinite programs over a network
This paper presents a first-order distributed algorithm for solving a convex semi-infinite program (SIP) over a time-varying network. In this setting, the objective function associated with the optimization problem is a summation of a set of functions, each held by one node in a network. The semi-infinite constraint, on the other hand, is known to all agents. The nodes collectively aim to solve the problem using local data about the objective and limited communication capabilities depending on the network topology. Our algorithm is built on three key ingredients: consensus step, gradient descent in the local objective, and local gradient descent iterations in the constraint at a node when the estimate violates the semi-infinite constraint. The algorithm is constructed, and its parameters are prescribed in such a way that the iterates held by each agent provably converge to an optimizer. That is, as the algorithm progresses, the estimates achieve consensus, and the constraint violation and the error in the optimal value are bounded above by vanishing terms. Simulation examples illustrate our results.
comment: 16 pages, 1 figure
Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots
Mobile smartphones compactly provide sensors such as cameras, IMUs, GNSS measurement units, and wireless and wired communication channels required for robotics projects. They are affordable, portable, and programmable, which makes them ideal for testing, data acquisition, controlling mobile robots, and many other robotic applications. A robotic system is proposed in this paper, consisting of an Android phone, a microcontroller board attached to the phone via USB, and a remote wireless controller station. In the data acquisition mode, the Android device can record a dataset of a diverse configuration of multiple cameras, IMUs, GNSS units, and external USB ADC channels in the rawest format used for, but not limited to, pose estimation and scene reconstruction applications. In robot control mode, the Android phone, a microcontroller board, and other peripherals constitute the mobile or stationary robotic system. This system is controlled using a remote server connected over Wi-Fi or Bluetooth. Experiments show that although the SLAM and AR applications can utilize the acquired data, the proposed system can pave the way for more advanced algorithms for processing these noisy and sporadic measurements. Moreover, the characteristics of the communication media are studied, and two example robotic projects, which involve controlling a toy car and a quadcopter, are included.
comment: Project repository: https://github.com/m-dayani/robo-platform Youtube Video: https://youtu.be/BTQ4yLB1bak Dataset: https://drive.google.com/drive/folders/1OZqdA1xa-SyJ64qL_TibqhtwhR1fWWrx?usp=sharing
Physics-Informed Neural Network-Based Control for Grid-Forming Converter's Stability Under Overload Conditions
Grid-forming converters (GFCs) are crucial for frequency and voltage stability in modern power systems. However, their performance under overload conditions remains a challenge. This paper highlights the limitations of existing approaches in managing DC source saturation and AC current limits, emphasizing the need for improved control strategies to ensure system stability. This paper proposes a control strategy based on a physics-informed neural network (PINN) to improve GFC performance under overloaded conditions, effectively preventing switch failures and mitigating DC source saturation. This approach outperforms conventional methods by maintaining stable voltage and frequency, even under significant load increase where traditional droop control alone proves inadequate. The post-disturbance operating point of GFCs remains unchanged using PINN-based control with an improvement of 0.245 Hz in frequency and 0.03 p.u. in active power when compared to an already existing current limitation strategy. Additionally, it reduces peak voltage deviations during transients by 24.14\%, lowers the rate of change of frequency (ROCOF) from 0.02 Hz/s to 0.005 Hz/s, and improves the rate of change of voltage (ROCOV), keeping both within acceptable limits. These improvements significantly enhance system resilience, especially in inertia-less power networks.
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that the clutch can deliver a peak holding force of 380 N with a 1.48 W consumption, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
Physics-informed machine learning for building performance simulation-A review of a nascent field
Building performance simulation (BPS) is critical for understanding building dynamics and behavior, analyzing performance of the built environment, optimizing energy efficiency, improving demand flexibility, and enhancing building resilience. However, conducting BPS is not trivial. Traditional BPS relies on an accurate building energy model, mostly physics-based, which depends heavily on detailed building information, expert knowledge, and case-by-case model calibrations, thereby significantly limiting their scalability. With the development of sensing technology and increased data availability, there is a growing attention and interest in data-driven BPS. However, purely data-driven models often suffer from limited generalization ability and a lack of physical consistency, resulting in poor performance in real-world applications. To address these limitations, recent studies have started to incorporate physics priors into data-driven models, a methodology called physics-informed machine learning (PIML). PIML is an emerging field with the definitions, methodologies, evaluation criteria, application scenarios, and future directions that remain open. To bridge those gaps, this study systematically reviews the state-of-art PIML for BPS, offering a comprehensive definition of PIML, and comparing it to traditional BPS approaches regarding data requirements, modeling effort, performance and computation cost. We also summarize the commonly used methodologies, validation approaches, application domains, available data sources, open-source packages and testbeds. In addition, this study provides a general guideline for selecting appropriate PIML models based on BPS applications. Finally, this study identifies key challenges and outlines future research directions, providing a solid foundation and valuable insights to advance R&D of PIML in BPS.
Model Identification Adaptive Control with $ρ$-POMDP Planning
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($\rho$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.
comment: Accepted to CoDIT 2025
SINDyG: Sparse Identification of Nonlinear Dynamical Systems from Graph-Structured Data
The combination of machine learning (ML) and sparsity-promoting techniques is enabling direct extraction of governing equations from data, revolutionizing computational modeling in diverse fields of science and engineering. The discovered dynamical models could be used to address challenges in climate science, neuroscience, ecology, finance, epidemiology, and beyond. However, most existing sparse identification methods for discovering dynamical systems treat the whole system as one without considering the interactions between subsystems. As a result, such models are not able to capture small changes in the emergent system behavior. To address this issue, we developed a new method called Sparse Identification of Nonlinear Dynamical Systems from Graph-structured data (SINDyG), which incorporates the network structure into sparse regression to identify model parameters that explain the underlying network dynamics. We showcase the application of our proposed method using several case studies of neuronal dynamics, where we model the macroscopic oscillation of a population of neurons using the extended Stuart-Landau (SL) equation and utilize the SINDyG method to identify the underlying nonlinear dynamics. Our extensive computational experiments validate the improved accuracy and simplicity of discovered network dynamics when compared to the original SINDy approach.
Safe PDE Boundary Control with Neural Operators
The physical world dynamics are generally governed by underlying partial differential equations (PDEs) with unknown analytical forms in science and engineering problems. Neural network based data-driven approaches have been heavily studied in simulating and solving PDE problems in recent years, but it is still challenging to move forward from understanding to controlling the unknown PDE dynamics. PDE boundary control instantiates a simplified but important problem by only focusing on PDE boundary conditions as the control input and output. However, current model-free PDE controllers cannot ensure the boundary output satisfies some given user-specified safety constraint. To this end, we propose a safety filtering framework to guarantee the boundary output stays within the safe set for current model-free controllers. Specifically, we first introduce a neural boundary control barrier function (BCBF) to ensure the feasibility of the trajectory-wise constraint satisfaction of boundary output. Based on the neural operator modeling the transfer function from boundary control input to output trajectories, we show that the change in the BCBF depends linearly on the change in input boundary, so quadratic programming-based safety filtering can be done for pre-trained model-free controllers. Extensive experiments under challenging hyperbolic, parabolic and Navier-Stokes PDE dynamics environments validate the plug-and-play effectiveness of the proposed method by achieving better general performance and boundary constraint satisfaction compared to the vanilla and constrained model-free controller baselines. The code is available at https://github.com/intelligent-control-lab/safe-pde-control.
comment: L4DC 2025 full version, 28 pages, 5 figures, 8 tables
Robotics
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Integrating Large Language Models (LLMs) with Reinforcement Learning (RL) can enhance autonomous driving (AD) performance in complex scenarios. However, current LLM-Dominated RL methods over-rely on LLM outputs, which are prone to hallucinations.Evaluations show that state-of-the-art LLM indicates a non-hallucination rate of only approximately 57.95% when assessed on essential driving-related tasks. Thus, in these methods, hallucinations from the LLM can directly jeopardize the performance of driving policies. This paper argues that maintaining relative independence between the LLM and the RL is vital for solving the hallucinations problem. Consequently, this paper is devoted to propose a novel LLM-Hinted RL paradigm. The LLM is used to generate semantic hints for state augmentation and policy optimization to assist RL agent in motion planning, while the RL agent counteracts potential erroneous semantic indications through policy learning to achieve excellent driving performance. Based on this paradigm, we propose the HCRMP (LLM-Hinted Contextual Reinforcement Learning Motion Planner) architecture, which is designed that includes Augmented Semantic Representation Module to extend state space. Contextual Stability Anchor Module enhances the reliability of multi-critic weight hints by utilizing information from the knowledge base. Semantic Cache Module is employed to seamlessly integrate LLM low-frequency guidance with RL high-frequency control. Extensive experiments in CARLA validate HCRMP's strong overall driving performance. HCRMP achieves a task success rate of up to 80.3% under diverse driving conditions with different traffic densities. Under safety-critical driving conditions, HCRMP significantly reduces the collision rate by 11.4%, which effectively improves the driving performance in complex scenarios.
Improving planning and MBRL with temporally-extended actions
Continuous time systems are often modeled using discrete time dynamics but this requires a small simulation step to maintain accuracy. In turn, this requires a large planning horizon which leads to computationally demanding planning problems and reduced performance. Previous work in model free reinforcement learning has partially addressed this issue using action repeats where a policy is learned to determine a discrete action duration. Instead we propose to control the continuous decision timescale directly by using temporally-extended actions and letting the planner treat the duration of the action as an additional optimization variable along with the standard action variables. This additional structure has multiple advantages. It speeds up simulation time of trajectories and, importantly, it allows for deep horizon search in terms of primitive actions while using a shallow search depth in the planner. In addition, in the model based reinforcement learning (MBRL) setting, it reduces compounding errors from model learning and improves training time for models. We show that this idea is effective and that the range for action durations can be automatically selected using a multi-armed bandit formulation and integrated into the MBRL framework. An extensive experimental evaluation both in planning and in MBRL, shows that our approach yields faster planning, better solutions, and that it enables solutions to problems that are not solved in the standard formulation.
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction. While prior works have primarily focused on high-level planning and long-horizon navigation, we shift attention to language-guided fine-grained trajectory control, where UAVs execute short-range, reactive flight behaviors in response to language instructions. We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach. In this framework, UAVs learn fine-grained control policies by mimicking expert pilot trajectories paired with atomic language instructions. To support this paradigm, we present UAV-Flow, the first real-world benchmark for language-conditioned, fine-grained UAV control. It includes a task formulation, a large-scale dataset collected in diverse environments, a deployable control framework, and a simulation suite for systematic evaluation. Our design enables UAVs to closely imitate the precise, expert-level flight trajectories of human pilots and supports direct deployment without sim-to-real gap. We conduct extensive experiments on UAV-Flow, benchmarking VLN and VLA paradigms. Results show that VLA models are superior to VLN baselines and highlight the critical role of spatial grounding in the fine-grained Flow setting.
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems
Foundation models (FMs) are increasingly used to bridge language and action in embodied agents, yet the operational characteristics of different FM integration strategies remain under-explored -- particularly for complex instruction following and versatile action generation in changing environments. This paper examines three paradigms for building robotic systems: end-to-end vision-language-action (VLA) models that implicitly integrate perception and planning, and modular pipelines incorporating either vision-language models (VLMs) or multimodal large language models (LLMs). We evaluate these paradigms through two focused case studies: a complex instruction grounding task assessing fine-grained instruction understanding and cross-modal disambiguation, and an object manipulation task targeting skill transfer via VLA finetuning. Our experiments in zero-shot and few-shot settings reveal trade-offs in generalization and data efficiency. By exploring performance limits, we distill design implications for developing language-driven physical agents and outline emerging challenges and opportunities for FM-powered robotics in real-world conditions.
comment: 17 pages, 13 figures
SwarmDiff: Swarm Robotic Trajectory Planning in Cluttered Environments via Diffusion Transformer
Swarm robotic trajectory planning faces challenges in computational efficiency, scalability, and safety, particularly in complex, obstacle-dense environments. To address these issues, we propose SwarmDiff, a hierarchical and scalable generative framework for swarm robots. We model the swarm's macroscopic state using Probability Density Functions (PDFs) and leverage conditional diffusion models to generate risk-aware macroscopic trajectory distributions, which then guide the generation of individual robot trajectories at the microscopic level. To ensure a balance between the swarm's optimal transportation and risk awareness, we integrate Wasserstein metrics and Conditional Value at Risk (CVaR). Additionally, we introduce a Diffusion Transformer (DiT) to improve sampling efficiency and generation quality by capturing long-range dependencies. Extensive simulations and real-world experiments demonstrate that SwarmDiff outperforms existing methods in computational efficiency, trajectory validity, and scalability, making it a reliable solution for swarm robotic trajectory planning.
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
The generalization capabilities of vision-language-action (VLA) models to unseen tasks are crucial to achieving general-purpose robotic manipulation in open-world settings. However, the cross-task generalization capabilities of existing VLA models remain significantly underexplored. To address this gap, we introduce AGNOSTOS, a novel simulation benchmark designed to rigorously evaluate cross-task zero-shot generalization in manipulation. AGNOSTOS comprises 23 unseen manipulation tasks for testing, distinct from common training task distributions, and incorporates two levels of generalization difficulty to assess robustness. Our systematic evaluation reveals that current VLA models, despite being trained on diverse datasets, struggle to generalize effectively to these unseen tasks. To overcome this limitation, we propose Cross-Task In-Context Manipulation (X-ICM), a method that conditions large language models (LLMs) on in-context demonstrations from seen tasks to predict action sequences for unseen tasks. Additionally, we introduce a dynamics-guided sample selection strategy that identifies relevant demonstrations by capturing cross-task dynamics. On AGNOSTOS, X-ICM significantly improves cross-task zero-shot generalization performance over leading VLAs. We believe AGNOSTOS and X-ICM will serve as valuable tools for advancing general-purpose robotic manipulation.
comment: Project Page: https://jiaming-zhou.github.io/AGNOSTOS
FLARE: Robot Learning with Implicit World Modeling
We introduce $\textbf{F}$uture $\textbf{LA}$tent $\textbf{RE}$presentation Alignment ($\textbf{FLARE}$), a novel framework that integrates predictive latent world modeling into robot policy learning. By aligning features from a diffusion transformer with latent embeddings of future observations, $\textbf{FLARE}$ enables a diffusion transformer policy to anticipate latent representations of future observations, allowing it to reason about long-term consequences while generating actions. Remarkably lightweight, $\textbf{FLARE}$ requires only minimal architectural modifications -- adding a few tokens to standard vision-language-action (VLA) models -- yet delivers substantial performance gains. Across two challenging multitask simulation imitation learning benchmarks spanning single-arm and humanoid tabletop manipulation, $\textbf{FLARE}$ achieves state-of-the-art performance, outperforming prior policy learning baselines by up to 26%. Moreover, $\textbf{FLARE}$ unlocks the ability to co-train with human egocentric video demonstrations without action labels, significantly boosting policy generalization to a novel object with unseen geometry with as few as a single robot demonstration. Our results establish $\textbf{FLARE}$ as a general and scalable approach for combining implicit world modeling with high-frequency robotic control.
comment: Project Webpage / Blogpost: https://research.nvidia.com/labs/gear/flare
World Models as Reference Trajectories for Rapid Motor Adaptation
Deploying learned control policies in real-world environments poses a fundamental challenge. When system dynamics change unexpectedly, performance degrades until models are retrained on new data. We introduce Reflexive World Models (RWM), a dual control framework that uses world model predictions as implicit reference trajectories for rapid adaptation. Our method separates the control problem into long-term reward maximization through reinforcement learning and robust motor execution through rapid latent control. This dual architecture achieves significantly faster adaptation with low online computational cost compared to model-based RL baselines, while maintaining near-optimal performance. The approach combines the benefits of flexible policy learning through reinforcement learning with rapid error correction capabilities, providing a principled approach to maintaining performance in high-dimensional continuous control tasks under varying dynamics.
Robo-DM: Data Management For Large Robot Datasets ICRA 2025
Recent results suggest that very large datasets of teleoperated robot demonstrations can be used to train transformer-based models that have the potential to generalize to new scenes, robots, and tasks. However, curating, distributing, and loading large datasets of robot trajectories, which typically consist of video, textual, and numerical modalities - including streams from multiple cameras - remains challenging. We propose Robo-DM, an efficient open-source cloud-based data management toolkit for collecting, sharing, and learning with robot data. With Robo-DM, robot datasets are stored in a self-contained format with Extensible Binary Meta Language (EBML). Robo-DM can significantly reduce the size of robot trajectory data, transfer costs, and data load time during training. Compared to the RLDS format used in OXE datasets, Robo-DM's compression saves space by up to 70x (lossy) and 3.5x (lossless). Robo-DM also accelerates data retrieval by load-balancing video decoding with memory-mapped decoding caches. Compared to LeRobot, a framework that also uses lossy video compression, Robo-DM is up to 50x faster when decoding sequentially. We physically evaluate a model trained by Robo-DM with lossy compression, a pick-and-place task, and In-Context Robot Transformer. Robo-DM uses 75x compression of the original dataset and does not suffer reduction in downstream task accuracy.
comment: Best paper finalist of IEEE ICRA 2025
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets
Vision-Language Models (VLMs) acquire real-world knowledge and general reasoning ability through Internet-scale image-text corpora. They can augment robotic systems with scene understanding and task planning, and assist visuomotor policies that are trained on robot trajectory data. We explore the reverse paradigm - using rich, real, multi-modal robot trajectory data to enhance and evaluate VLMs. In this paper, we present Robo2VLM, a Visual Question Answering (VQA) dataset generation framework for VLMs. Given a human tele-operated robot trajectory, Robo2VLM derives ground-truth from non-visual and non-descriptive sensory modalities, such as end-effector pose, gripper aperture, and force sensing. Based on these modalities, it segments the robot trajectory into a sequence of manipulation phases. At each phase, Robo2VLM uses scene and interaction understanding to identify 3D properties of the robot, task goal, and the target object. The properties are used to generate representative VQA queries - images with textural multiple-choice questions - based on spatial, goal-conditioned, and interaction reasoning question templates. We curate Robo2VLM-1, a large-scale in-the-wild dataset with 684,710 questions covering 463 distinct scenes and 3,396 robotic manipulation tasks from 176k real robot trajectories. Results suggest that Robo2VLM-1 can benchmark and improve VLM capabilities in spatial and interaction reasoning.
Coloring Between the Lines: Personalization in the Null Space of Planning Constraints
Generalist robots must personalize in-the-wild to meet the diverse needs and preferences of long-term users. How can we enable flexible personalization without sacrificing safety or competency? This paper proposes Coloring Between the Lines (CBTL), a method for personalization that exploits the null space of constraint satisfaction problems (CSPs) used in robot planning. CBTL begins with a CSP generator that ensures safe and competent behavior, then incrementally personalizes behavior by learning parameterized constraints from online interaction. By quantifying uncertainty and leveraging the compositionality of planning constraints, CBTL achieves sample-efficient adaptation without environment resets. We evaluate CBTL in (1) three diverse simulation environments; (2) a web-based user study; and (3) a real-robot assisted feeding system, finding that CBTL consistently achieves more effective personalization with fewer interactions than baselines. Our results demonstrate that CBTL provides a unified and practical approach for continual, flexible, active, and safe robot personalization. Website: https://emprise.cs.cornell.edu/cbtl/
Synthetic Enclosed Echoes: A New Dataset to Mitigate the Gap Between Simulated and Real-World Sonar Data
This paper introduces Synthetic Enclosed Echoes (SEE), a novel dataset designed to enhance robot perception and 3D reconstruction capabilities in underwater environments. SEE comprises high-fidelity synthetic sonar data, complemented by a smaller subset of real-world sonar data. To facilitate flexible data acquisition, a simulated environment has been developed, enabling the generation of additional data through modifications such as the inclusion of new structures or imaging sonar configurations. This hybrid approach leverages the advantages of synthetic data, including readily available ground truth and the ability to generate diverse datasets, while bridging the simulation-to-reality gap with real-world data acquired in a similar environment. The SEE dataset comprehensively evaluates acoustic data-based methods, including mathematics-based sonar approaches and deep learning algorithms. These techniques were employed to validate the dataset, confirming its suitability for underwater 3D reconstruction. Furthermore, this paper proposes a novel modification to a state-of-the-art algorithm, demonstrating improved performance compared to existing methods. The SEE dataset enables the evaluation of acoustic data-based methods in realistic scenarios, thereby improving their feasibility for real-world underwater applications.
comment: This work has been submitted to the IEEE for possible publication
Guided Policy Optimization under Partial Observability
Reinforcement Learning (RL) in partially observable environments poses significant challenges due to the complexity of learning under uncertainty. While additional information, such as that available in simulations, can enhance training, effectively leveraging it remains an open problem. To address this, we introduce Guided Policy Optimization (GPO), a framework that co-trains a guider and a learner. The guider takes advantage of privileged information while ensuring alignment with the learner's policy that is primarily trained via imitation learning. We theoretically demonstrate that this learning scheme achieves optimality comparable to direct RL, thereby overcoming key limitations inherent in existing approaches. Empirical evaluations show strong performance of GPO across various tasks, including continuous control with partial observability and noise, and memory-based challenges, significantly outperforming existing methods.
comment: 24 pages, 13 figures
Evaluation of Mobile Environment for Vehicular Visible Light Communication Using Multiple LEDs and Event Cameras
In the fields of Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD), sensors that serve as the ``eyes'' for sensing the vehicle's surrounding environment are essential. Traditionally, image sensors and LiDAR have played this role. However, a new type of vision sensor, event cameras, has recently attracted attention. Event cameras respond to changes in the surrounding environment (e.g., motion), exhibit strong robustness against motion blur, and perform well in high dynamic range environments, which are desirable in robotics applications. Furthermore, the asynchronous and low-latency principles of data acquisition make event cameras suitable for optical communication. By adding communication functionality to event cameras, it becomes possible to utilize I2V communication to immediately share information about forward collisions, sudden braking, and road conditions, thereby contributing to hazard avoidance. Additionally, receiving information such as signal timing and traffic volume enables speed adjustment and optimal route selection, facilitating more efficient driving. In this study, we construct a vehicle visible light communication system where event cameras are receivers, and multiple LEDs are transmitters. In driving scenes, the system tracks the transmitter positions and separates densely packed LED light sources using pilot sequences based on Walsh-Hadamard codes. As a result, outdoor vehicle experiments demonstrate error-free communication under conditions where the transmitter-receiver distance was within 40 meters and the vehicle's driving speed was 30 km/h (8.3 m/s).
comment: 6 pages, IEEE IV 2025 conference paper
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation
Mapping and understanding complex 3D environments is fundamental to how autonomous systems perceive and interact with the physical world, requiring both precise geometric reconstruction and rich semantic comprehension. While existing 3D semantic mapping systems excel at reconstructing and identifying predefined object instances, they lack the flexibility to efficiently build semantic maps with open-vocabulary during online operation. Although recent vision-language models have enabled open-vocabulary object recognition in 2D images, they haven't yet bridged the gap to 3D spatial understanding. The critical challenge lies in developing a training-free unified system that can simultaneously construct accurate 3D maps while maintaining semantic consistency and supporting natural language interactions in real time. In this paper, we develop a zero-shot framework that seamlessly integrates GPU-accelerated geometric reconstruction with open-vocabulary vision-language models through online instance-level semantic embedding fusion, guided by hierarchical object association with spatial indexing. Our training-free system achieves superior performance through incremental processing and unified geometric-semantic updates, while robustly handling 2D segmentation inconsistencies. The proposed general-purpose 3D scene understanding framework can be used for various tasks including zero-shot 3D instance retrieval, segmentation, and object detection to reason about previously unseen objects and interpret natural language queries. The project page is available at https://razer-3d.github.io.
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation Learning (SQIL), which combines quantization-aware training with a selective loss-weighting strategy for mission-critical states. By identifying these states via saliency scores and emphasizing them in the training loss, SQIL preserves decision fidelity under low-bit precision. We validate SQIL's generalization capability across extensive simulation benchmarks with environment variations, real-world tasks, and cross-domain tasks (self-driving, physics simulation), consistently recovering full-precision performance. Notably, a 4-bit weight-quantized VLA model for robotic manipulation achieves up to 2.5x speedup and 2.5x energy savings on an edge GPU with minimal accuracy loss. These results underline SQIL's potential for efficiently deploying large IL-based policy models on resource-limited devices.
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Vision-Language Models (VLMs) show promise for autonomous driving, yet their struggle with hallucinations, inefficient reasoning, and limited real-world validation hinders accurate perception and robust step-by-step reasoning. To overcome this, we introduce \textbf{AgentThink}, a pioneering unified framework that, for the first time, integrates Chain-of-Thought (CoT) reasoning with dynamic, agent-style tool invocation for autonomous driving tasks. AgentThink's core innovations include: \textbf{(i) Structured Data Generation}, by establishing an autonomous driving tool library to automatically construct structured, self-verified reasoning data explicitly incorporating tool usage for diverse driving scenarios; \textbf{(ii) A Two-stage Training Pipeline}, employing Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to equip VLMs with the capability for autonomous tool invocation; and \textbf{(iii) Agent-style Tool-Usage Evaluation}, introducing a novel multi-tool assessment protocol to rigorously evaluate the model's tool invocation and utilization. Experiments on the DriveLMM-o1 benchmark demonstrate AgentThink significantly boosts overall reasoning scores by \textbf{53.91\%} and enhances answer accuracy by \textbf{33.54\%}, while markedly improving reasoning quality and consistency. Furthermore, ablation studies and robust zero-shot/few-shot generalization experiments across various benchmarks underscore its powerful capabilities. These findings highlight a promising trajectory for developing trustworthy and tool-aware autonomous driving models.
comment: 18 pages, 8 figures
R3GS: Gaussian Splatting for Robust Reconstruction and Relocalization in Unconstrained Image Collections
We propose R3GS, a robust reconstruction and relocalization framework tailored for unconstrained datasets. Our method uses a hybrid representation during training. Each anchor combines a global feature from a convolutional neural network (CNN) with a local feature encoded by the multiresolution hash grids [2]. Subsequently, several shallow multi-layer perceptrons (MLPs) predict the attributes of each Gaussians, including color, opacity, and covariance. To mitigate the adverse effects of transient objects on the reconstruction process, we ffne-tune a lightweight human detection network. Once ffne-tuned, this network generates a visibility map that efffciently generalizes to other transient objects (such as posters, banners, and cars) with minimal need for further adaptation. Additionally, to address the challenges posed by sky regions in outdoor scenes, we propose an effective sky-handling technique that incorporates a depth prior as a constraint. This allows the inffnitely distant sky to be represented on the surface of a large-radius sky sphere, signiffcantly reducing ffoaters caused by errors in sky reconstruction. Furthermore, we introduce a novel relocalization method that remains robust to changes in lighting conditions while estimating the camera pose of a given image within the reconstructed 3DGS scene. As a result, R3GS significantly enhances rendering ffdelity, improves both training and rendering efffciency, and reduces storage requirements. Our method achieves state-of-the-art performance compared to baseline methods on in-the-wild datasets. The code will be made open-source following the acceptance of the paper.
comment: 7 pages, 4 figures
Learning-based Autonomous Oversteer Control and Collision Avoidance
Oversteer, wherein a vehicle's rear tires lose traction and induce unintentional excessive yaw, poses critical safety challenges. Failing to control oversteer often leads to severe traffic accidents. Although recent autonomous driving efforts have attempted to handle oversteer through stabilizing maneuvers, the majority rely on expert-defined trajectories or assume obstacle-free environments, limiting real-world applicability. This paper introduces a novel end-to-end (E2E) autonomous driving approach that tackles oversteer control and collision avoidance simultaneously. Existing E2E techniques, including Imitation Learning (IL), Reinforcement Learning (RL), and Hybrid Learning (HL), generally require near-optimal demonstrations or extensive experience. Yet even skilled human drivers struggle to provide perfect demonstrations under oversteer, and high transition variance hinders accumulating sufficient data. Hence, we present Q-Compared Soft Actor-Critic (QC-SAC), a new HL algorithm that effectively learns from suboptimal demonstration data and adapts rapidly to new conditions. To evaluate QC-SAC, we introduce a benchmark inspired by real-world driver training: a vehicle encounters sudden oversteer on a slippery surface and must avoid randomly placed obstacles ahead. Experimental results show QC-SAC attains near-optimal driving policies, significantly surpassing state-of-the-art IL, RL, and HL baselines. Our method demonstrates the world's first safe autonomous oversteer control with obstacle avoidance.
GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning
Training a universal controller for robots with different morphologies is a promising research trend, since it can significantly enhance the robustness and resilience of the robotic system. However, diverse morphologies can yield different dimensions of state space and action space, making it difficult to comply with traditional policy networks. Existing methods address this issue by modularizing the robot configuration, while do not adequately extract and utilize the overall morphological information, which has been proven crucial for training a universal controller. To this end, we propose GCNT, a morphology-agnostic policy network based on improved Graph Convolutional Network (GCN) and Transformer. It exploits the fact that GCN and Transformer can handle arbitrary number of modules to achieve compatibility with diverse morphologies. Our key insight is that the GCN is able to efficiently extract morphology information of robots, while Transformer ensures that it is fully utilized by allowing each node of the robot to communicate this information directly. Experimental results show that our method can generate resilient locomotion behaviors for robots with different configurations, including zero-shot generalization to robot morphologies not seen during training. In particular, GCNT achieved the best performance on 8 tasks in the 2 standard benchmarks.
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy
In endoscopic procedures, autonomous tracking of abnormal regions and following circumferential cutting markers can significantly reduce the cognitive burden on endoscopists. However, conventional model-based pipelines are fragile for each component (e.g., detection, motion planning) requires manual tuning and struggles to incorporate high-level endoscopic intent, leading to poor generalization across diverse scenes. Vision-Language-Action (VLA) models, which integrate visual perception, language grounding, and motion planning within an end-to-end framework, offer a promising alternative by semantically adapting to surgeon prompts without manual recalibration. Despite their potential, applying VLA models to robotic endoscopy presents unique challenges due to the complex and dynamic anatomical environments of the gastrointestinal (GI) tract. To address this, we introduce EndoVLA, designed specifically for continuum robots in GI interventions. Given endoscopic images and surgeon-issued tracking prompts, EndoVLA performs three core tasks: (1) polyp tracking, (2) delineation and following of abnormal mucosal regions, and (3) adherence to circular markers during circumferential cutting. To tackle data scarcity and domain shifts, we propose a dual-phase strategy comprising supervised fine-tuning on our EndoVLA-Motion dataset and reinforcement fine-tuning with task-aware rewards. Our approach significantly improves tracking performance in endoscopy and enables zero-shot generalization in diverse scenes and complex sequential tasks.
Cascaded Diffusion Models for Neural Motion Planning ICRA'25
Robots in the real world need to perceive and move to goals in complex environments without collisions. Avoiding collisions is especially difficult when relying on sensor perception and when goals are among clutter. Diffusion policies and other generative models have shown strong performance in solving local planning problems, but often struggle at avoiding all of the subtle constraint violations that characterize truly challenging global motion planning problems. In this work, we propose an approach for learning global motion planning using diffusion policies, allowing the robot to generate full trajectories through complex scenes and reasoning about multiple obstacles along the path. Our approach uses cascaded hierarchical models which unify global prediction and local refinement together with online plan repair to ensure the trajectories are collision free. Our method outperforms (by ~5%) a wide variety of baselines on challenging tasks in multiple domains including navigation and manipulation.
comment: ICRA'25
Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation
Robot manipulation learning from human demonstrations offers a rapid means to acquire skills but often lacks generalization across diverse scenes and object placements. This limitation hinders real-world applications, particularly in complex tasks requiring dexterous manipulation. Vision-Language-Action (VLA) paradigm leverages large-scale data to enhance generalization. However, due to data scarcity, VLA's performance remains limited. In this work, we introduce Object-Focus Actor (OFA), a novel, data-efficient approach for generalized dexterous manipulation. OFA exploits the consistent end trajectories observed in dexterous manipulation tasks, allowing for efficient policy training. Our method employs a hierarchical pipeline: object perception and pose estimation, pre-manipulation pose arrival and OFA policy execution. This process ensures that the manipulation is focused and efficient, even in varied backgrounds and positional layout. Comprehensive real-world experiments across seven tasks demonstrate that OFA significantly outperforms baseline methods in both positional and background generalization tests. Notably, OFA achieves robust performance with only 10 demonstrations, highlighting its data efficiency.
Learning-based Airflow Inertial Odometry for MAVs using Thermal Anemometers in a GPS and vision denied environment
This work demonstrates an airflow inertial based odometry system with multi-sensor data fusion, including thermal anemometer, IMU, ESC, and barometer. This goal is challenging because low-cost IMUs and barometers have significant bias, and anemometer measurements are very susceptible to interference from spinning propellers and ground effects. We employ a GRU-based deep neural network to estimate relative air speed from noisy and disturbed anemometer measurements, and an observer with bias model to fuse the sensor data and thus estimate the state of aerial vehicle. A complete flight data, including takeoff and landing on the ground, shows that the approach is able to decouple the downwash induced wind speed caused by propellers and the ground effect, and accurately estimate the flight speed in a wind-free indoor environment. IMU, and barometer bias are effectively estimated, which significantly reduces the position integration drift, which is only 5.7m for 203s manual random flight. The open source is available on https://github.com/SyRoCo-ISIR/Flight-Speed-Estimation-Airflow.
Histo-Planner: A Real-time Local Planner for MAVs Teleoperation based on Histogram of Obstacle Distribution
This paper concerns real-time obstacle avoidance for micro aerial vehicles (MAVs). Motivated by teleoperation applications in cluttered environments with limited computational power, we propose a local planner that does not require the knowledge or construction of a global map of the obstacles. The proposed solution consists of a real-time trajectory planning algorithm that relies on the histogram of obstacle distribution and a planner manager that triggers different planning modes depending on obstacles location around the MAV. The proposed solution is validated, for a teleoperation application, with both simulations and indoor experiments. Benchmark comparisons based on a designed simulation platform are also provided.
Fault-Tolerant Multi-Robot Coordination with Limited Sensing within Confined Environments
As robots are increasingly deployed to collaborate on tasks within shared workspaces and resources, the failure of an individual robot can critically affect the group's performance. This issue is particularly challenging when robots lack global information or direct communication, relying instead on social interaction for coordination and to complete their tasks. In this study, we propose a novel fault-tolerance technique leveraging physical contact interactions in multi-robot systems, specifically under conditions of limited sensing and spatial confinement. We introduce the "Active Contact Response" (ACR) method, where each robot modulates its behavior based on the likelihood of encountering an inoperative (faulty) robot. Active robots are capable of collectively repositioning stationary and faulty peers to reduce obstructions and maintain optimal group functionality. We implement our algorithm in a team of autonomous robots, equipped with contact-sensing and collision-tolerance capabilities, tasked with collectively excavating cohesive model pellets. Experimental results indicate that the ACR method significantly improves the system's recovery time from robot failures, enabling continued collective excavation with minimal performance degradation. Thus, this work demonstrates the potential of leveraging local, social, and physical interactions to enhance fault tolerance and coordination in multi-robot systems operating in constrained and extreme environments.
comment: 15 pages, 4 figures. Accepted to DARS 2024 (Distributed Autonomous Robotic Systems), to appear in Springer Proceedings in Advanced Robotics
Toward Task Capable Active Matter: Learning to Avoid Clogging in Confined Collectives via Collisions
Social organisms which construct nests consisting of tunnels and chambers necessarily navigate confined and crowded conditions. Unlike low-density collectives like bird flocks and insect swarms, in which hydrodynamic and statistical phenomena dominate, the physics of glasses and supercooled fluids is important to understand clogging behaviors in high-density collectives. Our previous work revealed that fire ants flowing in confined tunnels utilize diverse behaviors like unequal workload distributions, spontaneous direction reversals, and limited interaction times to mitigate clogging and jamming and thus maintain functional flow; implementation of similar rules in a small robophysical swarm led to high performance through spontaneous dissolution of clogs and clusters. However, how the insects learn such behaviors, and how we can develop "task capable" active matter in such regimes, remains a challenge in part because interaction dynamics are dominated by local, time-consuming collisions and no single agent can guide the entire collective. Here, we hypothesized that effective flow and clog mitigation could emerge purely through local learning. We tasked small groups of robots with pellet excavation in a narrow tunnel, allowing them to modify reversal probabilities over time. Initially, robots had equal probabilities and clogs were common. Reversals improved flow. When reversal probabilities adapted via collisions and noisy tunnel length estimates, workload inequality and performance improved. Our robophysical study of an excavating swarm shows that, despite the seeming complexity and difficulty of the task, simple learning rules can mitigate or leverage unavoidable features in task-capable dense active matter, leading to hypotheses for dense biological and robotic swarms.
comment: 13 pages, 9 figures. Published in Frontiers in Physics, Social Physics section. Includes experimental and simulation analysis of multi-robot excavation using decentralized learning
Shape-Adaptive Planning and Control for a Deformable Quadrotor
Drones have become essential in various applications, but conventional quadrotors face limitations in confined spaces and complex tasks. Deformable drones, which can adapt their shape in real-time, offer a promising solution to overcome these challenges, while also enhancing maneuverability and enabling novel tasks like object grasping. This paper presents a novel approach to autonomous motion planning and control for deformable quadrotors. We introduce a shape-adaptive trajectory planner that incorporates deformation dynamics into path generation, using a scalable kinodynamic A* search to handle deformation parameters in complex environments. The backend spatio-temporal optimization is capable of generating optimally smooth trajectories that incorporate shape deformation. Additionally, we propose an enhanced control strategy that compensates for external forces and torque disturbances, achieving a 37.3\% reduction in trajectory tracking error compared to our previous work. Our approach is validated through simulations and real-world experiments, demonstrating its effectiveness in narrow-gap traversal and multi-modal deformable tasks.
UniSTPA: A Safety Analysis Framework for End-to-End Autonomous Driving
As autonomous driving technology continues to advance, end-to-end models have attracted considerable attention owing to their superior generalisation capability. Nevertheless, such learning-based systems entail numerous safety risks throughout development and on-road deployment, and existing safety-analysis methods struggle to identify these risks comprehensively. To address this gap, we propose the Unified System Theoretic Process Analysis (UniSTPA) framework, which extends the scope of STPA from the operational phase to the entire lifecycle of an end-to-end autonomous driving system, including information gathering, data preparation, closed loop training, verification, and deployment. UniSTPA performs hazard analysis not only at the component level but also within the model's internal layers, thereby enabling fine-grained assessment of inter and intra module interactions. Using a highway Navigate on Autopilot function as a case study, UniSTPA uncovers multi-stage hazards overlooked by conventional approaches including scene design defects, sensor fusion biases, and internal model flaws, through multi-level causal analysis, traces these hazards to deeper issues such as data quality, network architecture, and optimisation objectives. The analysis result are used to construct a safety monitoring and safety response mechanism that supports continuous improvement from hazard identification to system optimisation. The proposed framework thus offers both theoretical and practical guidance for the safe development and deployment of end-to-end autonomous driving systems.
AnyBody: A Benchmark Suite for Cross-Embodiment Manipulation
Generalizing control policies to novel embodiments remains a fundamental challenge in enabling scalable and transferable learning in robotics. While prior works have explored this in locomotion, a systematic study in the context of manipulation tasks remains limited, partly due to the lack of standardized benchmarks. In this paper, we introduce a benchmark for learning cross-embodiment manipulation, focusing on two foundational tasks-reach and push-across a diverse range of morphologies. The benchmark is designed to test generalization along three axes: interpolation (testing performance within a robot category that shares the same link structure), extrapolation (testing on a robot with a different link structure), and composition (testing on combinations of link structures). On the benchmark, we evaluate the ability of different RL policies to learn from multiple morphologies and to generalize to novel ones. Our study aims to answer whether morphology-aware training can outperform single-embodiment baselines, whether zero-shot generalization to unseen morphologies is feasible, and how consistently these patterns hold across different generalization regimes. The results highlight the current limitations of multi-embodiment learning and provide insights into how architectural and training design choices influence policy generalization.
Toward Informed AV Decision-Making: Computational Model of Well-being and Trust in Mobility
For future human-autonomous vehicle (AV) interactions to be effective and smooth, human-aware systems that analyze and align human needs with automation decisions are essential. Achieving this requires systems that account for human cognitive states. We present a novel computational model in the form of a Dynamic Bayesian Network (DBN) that infers the cognitive states of both AV users and other road users, integrating this information into the AV's decision-making process. Specifically, our model captures the well-being of both an AV user and an interacting road user as cognitive states alongside trust. Our DBN models infer beliefs over the AV user's evolving well-being, trust, and intention states, as well as the possible well-being of other road users, based on observed interaction experiences. Using data collected from an interaction study, we refine the model parameters and empirically assess its performance. Finally, we extend our model into a causal inference model (CIM) framework for AV decision-making, enabling the AV to enhance user well-being and trust while balancing these factors with its own operational costs and the well-being of interacting road users. Our evaluation demonstrates the model's effectiveness in accurately predicting user's states and guiding informed, human-centered AV decisions.
Motion Priors Reimagined: Adapting Flat-Terrain Skills for Complex Quadruped Mobility
Reinforcement learning (RL)-based legged locomotion controllers often require meticulous reward tuning to track velocities or goal positions while preserving smooth motion on various terrains. Motion imitation methods via RL using demonstration data reduce reward engineering but fail to generalize to novel environments. We address this by proposing a hierarchical RL framework in which a low-level policy is first pre-trained to imitate animal motions on flat ground, thereby establishing motion priors. A subsequent high-level, goal-conditioned policy then builds on these priors, learning residual corrections that enable perceptive locomotion, local obstacle avoidance, and goal-directed navigation across diverse and rugged terrains. Simulation experiments illustrate the effectiveness of learned residuals in adapting to progressively challenging uneven terrains while still preserving the locomotion characteristics provided by the motion priors. Furthermore, our results demonstrate improvements in motion regularization over baseline models trained without motion priors under similar reward setups. Real-world experiments with an ANYmal-D quadruped robot confirm our policy's capability to generalize animal-like locomotion skills to complex terrains, demonstrating smooth and efficient locomotion and local navigation performance amidst challenging terrains with obstacles.
WaveTouch: Active Tactile Sensing Using Vibro-Feedback for Classification of Variable Stiffness and Infill Density Objects
The perception and recognition of the surroundings is one of the essential tasks for a robot. With preliminary knowledge about a target object, it can perform various manipulation tasks such as rolling motion, palpation, and force control. Minimizing possible damage to the sensing system and testing objects during manipulation are significant concerns that persist in existing research solutions. To address this need, we designed a new type of tactile sensor based on the active vibro-feedback for object stiffness classification. With this approach, the classification can be performed during the gripping process, enabling the robot to quickly estimate the appropriate level of gripping force required to avoid damaging or dropping the object. This contrasts with passive vibration sensing, which requires to be triggered by object movement and is often inefficient for establishing a secure grip. The main idea is to observe the received changes in artificially injected vibrations that propagate through objects with different physical properties and molecular structures. The experiments with soft subjects demonstrated higher absorption of the received vibrations, while the opposite is true for the rigid subjects that not only demonstrated low absorption but also enhancement of the vibration signal.
comment: Presented at RSS 2024 Workshop "Noosphere: Tactile Sensing for General Purpose Robot Learning"
Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios
In collaborative human-robot environments, the unpredictable and dynamic nature of human motion can lead to situations where collisions become unavoidable. In such cases, it is essential for the robotic system to proactively mitigate potential harm through intelligent control strategies. This paper presents a hierarchical control framework based on Control Barrier Functions (CBFs) designed to ensure safe and adaptive operation of autonomous robotic manipulators during close-proximity human-robot interaction. The proposed method introduces a relaxation variable that enables real-time prioritization of safety constraints, allowing the robot to dynamically manage collision risks based on the criticality of different parts of the human body. A secondary constraint mechanism is incorporated to resolve infeasibility by increasing the priority of imminent threats. The framework is experimentally validated on a Franka Research 3 robot equipped with a ZED2i AI camera for real-time human pose and body detection. Experimental results confirm that the CBF-based controller, integrated with depth sensing, facilitates responsive and safe human-robot collaboration, while providing detailed risk analysis and maintaining robust performance in highly dynamic settings.
Reference Free Platform Adaptive Locomotion for Quadrupedal Robots using a Dynamics Conditioned Policy
This article presents Platform Adaptive Locomotion (PAL), a unified control method for quadrupedal robots with different morphologies and dynamics. We leverage deep reinforcement learning to train a single locomotion policy on procedurally generated robots. The policy maps proprioceptive robot state information and base velocity commands into desired joint actuation targets, which are conditioned using a latent embedding of the temporally local system dynamics. We explore two conditioning strategies - one using a GRU-based dynamics encoder and another using a morphology-based property estimator - and show that morphology-aware conditioning outperforms temporal dynamics encoding regarding velocity task tracking for our hardware test on ANYmal C. Our results demonstrate that both approaches achieve robust zero-shot transfer across multiple unseen simulated quadrupeds. Furthermore, we demonstrate the need for careful robot reference modelling during training, enabling us to reduce the velocity tracking error by up to 30% compared to the baseline method. Despite PAL not surpassing the best-performing reference-free controller in all cases, our analysis uncovers critical design choices and informs improvements to the state of the art.
comment: 8 pages, 6 tables, 5 figures
Integrating Robotic Navigation with Blockchain: A Novel PoS-Based Approach for Heterogeneous Robotic Teams
This work explores a novel integration of blockchain methodologies with Wide Area Visual Navigation (WAVN) to address challenges in visual navigation for a heterogeneous team of mobile robots deployed for unstructured applications in agriculture, forestry, etc. Focusing on overcoming challenges such as GPS independence, environmental changes, and computational limitations, the study introduces the Proof of Stake (PoS) mechanism, commonly used in blockchain systems, into the WAVN framework \cite{Lyons_2022}. This integration aims to enhance the cooperative navigation capabilities of robotic teams by prioritizing robot contributions based on their navigation reliability. The methodology involves a stake weight function, consensus score with PoS, and a navigability function, addressing the computational complexities of robotic cooperation and data validation. This innovative approach promises to optimize robotic teamwork by leveraging blockchain principles, offering insights into the scalability, efficiency, and overall system performance. The project anticipates significant advancements in autonomous navigation and the broader application of blockchain technology beyond its traditional financial context.
comment: 4 pages, 4 figures, presented at 2024 21st International Conference on Ubiquitous Robots (UR) June 24 - 27, 2024, New York University, USA
Human Workload Prediction: Lag Horizon Selection
Human-robot teams must be aware of human workload when operating in uncertain, dynamic environments. Prior work employed physiological response metrics from wearable sensors to estimate the current human workload; however, these estimates only enable robots to respond to under- or overload conditions reactively. Current human workload prediction approaches are limited to short prediction horizons and fail to investigate variable lag horizons' impact on predictions. This letter investigates the impact of lag horizons on both univariate and multivariate time series forecasting models for human workload prediction. A key finding is that univariate predictions required longer lag horizons of 240 seconds (s), whereas multivariate workload predictions sufficed with shorter lag horizons with diminishing returns around 120s.
comment: 4 pages, 1 figures, Submitted to the IEEE for possible publication
VERDI: VLM-Embedded Reasoning for Autonomous Driving
While autonomous driving (AD) stacks struggle with decision making under partial observability and real-world complexity, human drivers are capable of commonsense reasoning to make near-optimal decisions with limited information. Recent work has attempted to leverage finetuned Vision-Language Models (VLMs) for trajectory planning at inference time to emulate human behavior. Despite their success in benchmark evaluations, these methods are often impractical to deploy (a 70B parameter VLM inference at merely 8 tokens per second requires more than 160G of memory), and their monolithic network structure prohibits safety decomposition. To bridge this gap, we propose VLM-Embedded Reasoning for autonomous Driving (VERDI), a training-time framework that distills the reasoning process and commonsense knowledge of VLMs into the AD stack. VERDI augments modular differentiable end-to-end (e2e) AD models by aligning intermediate module outputs at the perception, prediction, and planning stages with text features explaining the driving reasoning process produced by VLMs. By encouraging alignment in latent space, \textsc{VERDI} enables the modular AD stack to internalize structured reasoning, without incurring the inference-time costs of large VLMs. We demonstrate the effectiveness of our method on the NuScenes dataset and find that VERDI outperforms existing e2e methods that do not embed reasoning by 10% in $\ell_{2}$ distance, while maintaining high inference speed.
Generative AI for Autonomous Driving: A Review
Generative AI (GenAI) is rapidly advancing the field of Autonomous Driving (AD), extending beyond traditional applications in text, image, and video generation. We explore how generative models can enhance automotive tasks, such as static map creation, dynamic scenario generation, trajectory forecasting, and vehicle motion planning. By examining multiple generative approaches ranging from Variational Autoencoder (VAEs) over Generative Adversarial Networks (GANs) and Invertible Neural Networks (INNs) to Generative Transformers (GTs) and Diffusion Models (DMs), we highlight and compare their capabilities and limitations for AD-specific applications. Additionally, we discuss hybrid methods integrating conventional techniques with generative approaches, and emphasize their improved adaptability and robustness. We also identify relevant datasets and outline open research questions to guide future developments in GenAI. Finally, we discuss three core challenges: safety, interpretability, and realtime capabilities, and present recommendations for image generation, dynamic scenario generation, and planning.
comment: This work has been submitted to the IEEE for possible publication
Filtering Learning Histories Enhances In-Context Reinforcement Learning
Transformer models (TMs) have exhibited remarkable in-context reinforcement learning (ICRL) capabilities, allowing them to generalize to and improve in previously unseen environments without re-training or fine-tuning. This is typically accomplished by imitating the complete learning histories of a source RL algorithm over a substantial amount of pretraining environments, which, however, may transfer suboptimal behaviors inherited from the source algorithm/dataset. Therefore, in this work, we address the issue of inheriting suboptimality from the perspective of dataset preprocessing. Motivated by the success of the weighted empirical risk minimization, we propose a simple yet effective approach, learning history filtering (LHF), to enhance ICRL by reweighting and filtering the learning histories based on their improvement and stability characteristics. To the best of our knowledge, LHF is the first approach to avoid source suboptimality by dataset preprocessing, and can be combined with the current state-of-the-art (SOTA) ICRL algorithms. We substantiate the effectiveness of LHF through a series of experiments conducted on the well-known ICRL benchmarks, encompassing both discrete environments and continuous robotic manipulation tasks, with three SOTA ICRL algorithms (AD, DPT, DICP) as the backbones. LHF exhibits robust performance across a variety of suboptimal scenarios, as well as under varying hyperparameters and sampling strategies. Notably, the superior performance of LHF becomes more pronounced in the presence of noisy data, indicating the significance of filtering learning histories.
Sketch Interface for Teleoperation of Mobile Manipulator to Enable Intuitive and Intended Operation: A Proof of Concept
Recent advancements in robotics have underscored the need for effective collaboration between humans and robots. Traditional interfaces often struggle to balance robot autonomy with human oversight, limiting their practical application in complex tasks like mobile manipulation. This study aims to develop an intuitive interface that enables a mobile manipulator to autonomously interpret user-provided sketches, enhancing user experience while minimizing burden. We implemented a web-based application utilizing machine learning algorithms to process sketches, making the interface accessible on mobile devices for use anytime, anywhere, by anyone. In the first validation, we examined natural sketches drawn by users for 27 selected manipulation and navigation tasks, gaining insights into trends related to sketch instructions. The second validation involved comparative experiments with five grasping tasks, showing that the sketch interface reduces workload and enhances intuitiveness compared to conventional axis control interfaces. These findings suggest that the proposed sketch interface improves the efficiency of mobile manipulators and opens new avenues for integrating intuitive human-robot collaboration in various applications.
comment: This paper has been accepted to the the 20th edition of the IEEE/ACM International Conference on Human-Robot Interaction (HRI'25), which will be held in Melbourne, Australia on March 4-6, 2025. Project page: https://toyotafrc.github.io/SketchInterfacePoC-Proj/
From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios
Ensuring the safety of autonomous vehicles requires virtual scenario-based testing, which depends on the robust evaluation and generation of safety-critical scenarios. So far, researchers have used scenario-based testing frameworks that rely heavily on handcrafted scenarios as safety metrics. To reduce the effort of human interpretation and overcome the limited scalability of these approaches, we combine Large Language Models (LLMs) with structured scenario parsing and prompt engineering to automatically evaluate and generate safety-critical driving scenarios. We introduce Cartesian and Ego-centric prompt strategies for scenario evaluation, and an adversarial generation module that modifies trajectories of risk-inducing vehicles (ego-attackers) to create critical scenarios. We validate our approach using a 2D simulation framework and multiple pre-trained LLMs. The results show that the evaluation module effectively detects collision scenarios and infers scenario safety. Meanwhile, the new generation module identifies high-risk agents and synthesizes realistic, safety-critical scenarios. We conclude that an LLM equipped with domain-informed prompting techniques can effectively evaluate and generate safety-critical driving scenarios, reducing dependence on handcrafted metrics. We release our open-source code and scenarios at: https://github.com/TUM-AVS/From-Words-to-Collisions.
comment: New version of the paper
Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation
Coordinated multi-arm manipulation requires satisfying multiple simultaneous geometric constraints across high-dimensional configuration spaces, which poses a significant challenge for traditional planning and control methods. In this work, we propose Adaptive Diffusion Constrained Sampling (ADCS), a generative framework that flexibly integrates both equality (e.g., relative and absolute pose constraints) and structured inequality constraints (e.g., proximity to object surfaces) into an energy-based diffusion model. Equality constraints are modeled using dedicated energy networks trained on pose differences in Lie algebra space, while inequality constraints are represented via Signed Distance Functions (SDFs) and encoded into learned constraint embeddings, allowing the model to reason about complex spatial regions. A key innovation of our method is a Transformer-based architecture that learns to weight constraint-specific energy functions at inference time, enabling flexible and context-aware constraint integration. Moreover, we adopt a two-phase sampling strategy that improves precision and sample diversity by combining Langevin dynamics with resampling and density-aware re-weighting. Experimental results on dual-arm manipulation tasks show that ADCS significantly improves sample diversity and generalization across settings demanding precise coordination and adaptive constraint handling.
Design of a 3-DOF Hopping Robot with an Optimized Gearbox: An Intermediate Platform Toward Bipedal Robots
This paper presents a 3-DOF hopping robot with a human-like lower-limb joint configuration and a flat foot, capable of performing dynamic and repetitive jumping motions. To achieve both high torque output and a large hollow shaft diameter for efficient cable routing, a compact 3K compound planetary gearbox was designed using mixed-integer nonlinear programming for gear tooth optimization. To meet performance requirements within the constrained joint geometry, all major components-including the actuator, motor driver, and communication interface-were custom-designed. The robot weighs 12.45 kg, including a dummy mass, and measures 840 mm in length when the knee joint is fully extended. A reinforcement learning-based controller was employed, and robot's performance was validated through hardware experiments, demonstrating stable and repetitive hopping motions in response to user inputs. These experimental results indicate that the platform serves as a solid foundation for future bipedal robot development.
Towards Robust Autonomous Landing Systems: Iterative Solutions and Key Lessons Learned
Uncrewed Aerial Vehicles (UAVs) have become a focal point of research, with both established companies and startups investing heavily in their development. This paper presents our iterative process in developing a robust autonomous marker-based landing system, highlighting the key challenges encountered and the solutions implemented. It reviews existing systems for autonomous landing processes, and through this aims to contribute to the community by sharing insights and challenges faced during development and testing.
RoboCrowd: Scaling Robot Data Collection through Crowdsourcing ICRA
In recent years, imitation learning from large-scale human demonstrations has emerged as a promising paradigm for training robot policies. However, the burden of collecting large quantities of human demonstrations is significant in terms of collection time and the need for access to expert operators. We introduce a new data collection paradigm, RoboCrowd, which distributes the workload by utilizing crowdsourcing principles and incentive design. RoboCrowd helps enable scalable data collection and facilitates more efficient learning of robot policies. We build RoboCrowd on top of ALOHA (Zhao et al. 2023) -- a bimanual platform that supports data collection via puppeteering -- to explore the design space for crowdsourcing in-person demonstrations in a public environment. We propose three classes of incentive mechanisms to appeal to users' varying sources of motivation for interacting with the system: material rewards, intrinsic interest, and social comparison. We instantiate these incentives through tasks that include physical rewards, engaging or challenging manipulations, as well as gamification elements such as a leaderboard. We conduct a large-scale, two-week field experiment in which the platform is situated in a university cafe. We observe significant engagement with the system -- over 200 individuals independently volunteered to provide a total of over 800 interaction episodes. Our findings validate the proposed incentives as mechanisms for shaping users' data quantity and quality. Further, we demonstrate that the crowdsourced data can serve as useful pre-training data for policies fine-tuned on expert demonstrations -- boosting performance up to 20% compared to when this data is not available. These results suggest the potential for RoboCrowd to reduce the burden of robot data collection by carefully implementing crowdsourcing and incentive design principles.
comment: 21 pages, 25 figures. International Conference on Robotics and Automation (ICRA) 2025
Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey
With the real need of field exploration in large-scale and extreme outdoor environments, cooperative exploration tasks have garnered increasing attention. This paper presents a comprehensive review of multi-robot cooperative exploration systems. First, we review the evolution of robotic exploration and introduce a modular research framework tailored for multi-robot cooperative exploration. Based on this framework, we systematically categorize and summarize key system components. As a foundational module for multi-robot exploration, the localization and mapping module is primarily introduced by focusing on global and relative pose estimation, as well as multi-robot map merging techniques. The cooperative motion module is further divided into learning-based approaches and multi-stage planning, with the latter encompassing target generation, task allocation, and motion planning strategies. Given the communication constraints of real-world environments, we also analyze the communication module, emphasizing how robots exchange information within local communication ranges and under limited transmission capabilities. In addition, we introduce the actual application of multi-robot cooperative exploration systems in DARPA SubT Challenge. Finally, we discuss the challenges and future research directions for multi-robot cooperative exploration in light of real-world trends. This review aims to serve as a valuable reference for researchers and practitioners in the field.
Effective Sampling for Robot Motion Planning Through the Lens of Lattices
Sampling-based methods for motion planning, which capture the structure of the robot's free space via (typically random) sampling, have gained popularity due to their scalability, simplicity, and for offering global guarantees, such as probabilistic completeness and asymptotic optimality. Unfortunately, the practicality of those guarantees remains limited as they do not provide insights into the behavior of motion planners for a finite number of samples (i.e., a finite running time). In this work, we harness lattice theory and the concept of $(\delta,\epsilon)$-completeness by Tsao et al. (2020) to construct deterministic sample sets that endow their planners with strong finite-time guarantees while minimizing running time. In particular, we introduce a highly-efficient deterministic sampling approach based on the $A_d^*$ lattice, which is the best-known geometric covering in dimensions $\leq 21$. Using our new sampling approach, we obtain at least an order-of-magnitude speedup over existing deterministic and uniform random sampling methods for complex motion-planning problems. Overall, our work provides deep mathematical insights while advancing the practical applicability of sampling-based motion planning.
comment: To appear in Robotics: Science and Systems, 2025
M3TR: A Generalist Model for Real-World HD Map Completion
Autonomous vehicles rely on HD maps for their operation, but offline HD maps eventually become outdated. For this reason, online HD map construction methods use live sensor data to infer map information instead. Research on real map changes shows that oftentimes entire parts of an HD map remain unchanged and can be used as a prior. We therefore introduce M3TR (Multi-Masking Map Transformer), a generalist approach for HD map completion both with and without offline HD map priors. As a necessary foundation, we address shortcomings in ground truth labels for Argoverse 2 and nuScenes and propose the first comprehensive benchmark for HD map completion. Unlike existing models that specialize in a single kind of map change, which is unrealistic for deployment, our Generalist model handles all kinds of changes, matching the effectiveness of Expert models. With our map masking as augmentation regime, we can even achieve a +1.4 mAP improvement without a prior. Finally, by fully utilizing prior HD map elements and optimizing query designs, M3TR outperforms existing methods by +4.3 mAP while being the first real-world deployable model for offline HD map priors. Code is available at https://github.com/immel-f/m3tr
PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning ICML 2025
Predicting future scene representations is a crucial task for enabling robots to understand and interact with the environment. However, most existing methods rely on videos and simulations with precise action annotations, limiting their ability to leverage the large amount of available unlabeled video data. To address this challenge, we propose PlaySlot, an object-centric video prediction model that infers object representations and latent actions from unlabeled video sequences. It then uses these representations to forecast future object states and video frames. PlaySlot allows the generation of multiple possible futures conditioned on latent actions, which can be inferred from video dynamics, provided by a user, or generated by a learned action policy, thus enabling versatile and interpretable world modeling. Our results show that PlaySlot outperforms both stochastic and object-centric baselines for video prediction across different environments. Furthermore, we show that our inferred latent actions can be used to learn robot behaviors sample-efficiently from unlabeled video demonstrations. Videos and code are available on https://play-slot.github.io/PlaySlot/.
comment: ICML 2025
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation
Vision-Language Navigation (VLN) aims to guide agents by leveraging language instructions and visual cues, playing a pivotal role in embodied AI. Indoor VLN has been extensively studied, whereas outdoor aerial VLN remains underexplored. The potential reason is that outdoor aerial view encompasses vast areas, making data collection more challenging, which results in a lack of benchmarks. To address this problem, we propose OpenFly, a platform comprising various rendering engines, a versatile toolchain, and a large-scale benchmark for aerial VLN. Firstly, we integrate diverse rendering engines and advanced techniques for environment simulation, including Unreal Engine, GTA V, Google Earth, and 3D Gaussian Splatting (3D GS). Particularly, 3D GS supports real-to-sim rendering, further enhancing the realism of our environments. Secondly, we develop a highly automated toolchain for aerial VLN data collection, streamlining point cloud acquisition, scene semantic segmentation, flight trajectory creation, and instruction generation. Thirdly, based on the toolchain, we construct a large-scale aerial VLN dataset with 100k trajectories, covering diverse heights and lengths across 18 scenes. Moreover, we propose OpenFly-Agent, a keyframe-aware VLN model emphasizing key observations during flight. For benchmarking, extensive experiments and analyses are conducted, evaluating several recent VLN methods and showcasing the superiority of our OpenFly platform and agent. The toolchain, dataset, and codes will be open-sourced.
DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting ICRA2025
This work presents DLO-Splatting, an algorithm for estimating the 3D shape of Deformable Linear Objects (DLOs) from multi-view RGB images and gripper state information through prediction-update filtering. The DLO-Splatting algorithm uses a position-based dynamics model with shape smoothness and rigidity dampening corrections to predict the object shape. Optimization with a 3D Gaussian Splatting-based rendering loss iteratively renders and refines the prediction to align it with the visual observations in the update step. Initial experiments demonstrate promising results in a knot tying scenario, which is challenging for existing vision-only methods.
comment: 5 pages, 2 figures, presented at the 2025 5th Workshop: Reflections on Representations and Manipulating Deformable Objects at the IEEE International Conference on Robotics and Automation. RMDO workshop (https://deformable-workshop.github.io/icra2025/). Video (https://www.youtube.com/watch?v=CG4WDWumGXA). Poster (https://hollydinkel.github.io/assets/pdf/ICRA2025RMDO_poster.pdf)
ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards
Quadrotor control policies can be trained with high performance using the exact gradients of the rewards to directly optimize policy parameters via backpropagation-through-time (BPTT). However, designing a fully differentiable reward architecture is often challenging. Partially differentiable rewards will result in biased gradient propagation that degrades training performance. To overcome this limitation, we propose Amended Backpropagation-through-Time (ABPT), a novel approach that mitigates gradient bias while preserving the training efficiency of BPTT. ABPT combines 0-step and N-step returns, effectively reducing the bias by leveraging value gradients from the learned Q-value function. Additionally, it adopts entropy regularization and state initialization mechanisms to encourage exploration during training. We evaluate ABPT on four representative quadrotor flight tasks \li{in both real world and simulation}. Experimental results demonstrate that ABPT converges significantly faster and achieves higher ultimate rewards than existing learning algorithms, particularly in tasks involving partially differentiable rewards. The code will be released at http://github.com/Fanxing-LI/ABPT.
DualLQR: Efficient Grasping of Oscillating Apples using Task Parameterized Learning from Demonstration
Learning from Demonstration offers great potential for robots to learn to perform agricultural tasks, specifically selective harvesting. One of the challenges is that the target fruit can be oscillating while approaching. Grasping oscillating targets has two requirements: 1) close tracking of the target during the final approach for damage-free grasping, and 2) the complete path should be as short as possible for improved efficiency. We propose a new method called DualLQR. In this method, we use a finite horizon Linear Quadratic Regulator (LQR) on a moving target, without the need of refitting the LQR. To make this possible, we use a dual LQR set-up, with an LQR running in two separate reference frames. Through extensive simulation testing, it was found that the state-of-art method barely meets the required final accuracy without oscillations and drops below the required accuracy with an oscillating target. DualLQR, on the other hand, was found to be able to meet the required final accuracy even with high oscillations, while travelling the least distance. Further testing on a real-world apple grasping task showed that DualLQR was able to successfully grasp oscillating apples, with a success rate of 99%.
comment: Accepted to IAS-19 url: https://openreview.net/forum?id=vipwIaog1u
Occupancy-SLAM: An Efficient and Robust Algorithm for Simultaneously Optimizing Robot Poses and Occupancy Map
Joint optimization of poses and features has been extensively studied and demonstrated to yield more accurate results in feature-based SLAM problems. However, research on jointly optimizing poses and non-feature-based maps remains limited. Occupancy maps are widely used non-feature-based environment representations because they effectively classify spaces into obstacles, free areas, and unknown regions, providing robots with spatial information for various tasks. In this paper, we propose Occupancy-SLAM, a novel optimization-based SLAM method that enables the joint optimization of robot trajectory and the occupancy map through a parameterized map representation. The key novelty lies in optimizing both robot poses and occupancy values at different cell vertices simultaneously, a significant departure from existing methods where the robot poses need to be optimized first before the map can be estimated. Evaluations using simulations and practical 2D laser datasets demonstrate that the proposed approach can robustly obtain more accurate robot trajectories and occupancy maps than state-of-the-art techniques with comparable computational time. Preliminary results in the 3D case further confirm the potential of the proposed method in practical 3D applications, achieving more accurate results than existing methods.
comment: Accepted for publication in the IEEE Transactions on Robotics (T-RO), 2025
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers
Modern deep policy gradient methods achieve effective performance on simulated robotic tasks, but they all require large replay buffers or expensive batch updates, or both, making them incompatible for real systems with resource-limited computers. We show that these methods fail catastrophically when limited to small replay buffers or during incremental learning, where updates only use the most recent sample without batch updates or a replay buffer. We propose a novel incremental deep policy gradient method -- Action Value Gradient (AVG) and a set of normalization and scaling techniques to address the challenges of instability in incremental learning. On robotic simulation benchmarks, we show that AVG is the only incremental method that learns effectively, often achieving final performance comparable to batch policy gradient methods. This advancement enabled us to show for the first time effective deep reinforcement learning with real robots using only incremental updates, employing a robotic manipulator and a mobile robot.
comment: In The Thirty-eighth Annual Conference on Neural Information Processing Systems. Source code at https://github.com/gauthamvasan/avg and companion video at https://youtu.be/cwwuN6Hyew0
Learning Novel Skills from Language-Generated Demonstrations ICLR
Robots are increasingly deployed across diverse domains to tackle tasks requiring novel skills. However, current robot learning algorithms for acquiring novel skills often rely on demonstration datasets or environment interactions, resulting in high labor costs and potential safety risks. To address these challenges, this study proposes DemoGen, a skill-learning framework that enables robots to acquire novel skills from natural language instructions. DemoGen leverages the vision-language model and the video diffusion model to generate demonstration videos of novel skills, which enabling robots to learn new skills effectively. Experimental evaluations in the MetaWorld simulation environments demonstrate the pipeline's capability to generate high-fidelity and reliable demonstrations. Using the generated demonstrations, various skill learning algorithms achieve an accomplishment rate three times the original on novel tasks. These results highlight a novel approach to robot learning, offering a foundation for the intuitive and intelligent acquisition of novel robotic skills. (Project website: https://aoqunjin.github.io/LNSLGD/)
comment: 10 pages, International Conference on Learning Representations (ICLR) 2025 Workshop on Generative Models for Robot Learning (GenBot)
MoE-Loco: Mixture of Experts for Multitask Locomotion
We present MoE-Loco, a Mixture of Experts (MoE) framework for multitask locomotion for legged robots. Our method enables a single policy to handle diverse terrains, including bars, pits, stairs, slopes, and baffles, while supporting quadrupedal and bipedal gaits. Using MoE, we mitigate the gradient conflicts that typically arise in multitask reinforcement learning, improving both training efficiency and performance. Our experiments demonstrate that different experts naturally specialize in distinct locomotion behaviors, which can be leveraged for task migration and skill composition. We further validate our approach in both simulation and real-world deployment, showcasing its robustness and adaptability.
comment: 9 pages, 10 figures
FaVoR: Features via Voxel Rendering for Camera Relocalization WACV
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.
comment: In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, Arizona, US, Feb 28-Mar 4, 2025
Reachable Sets-based Trajectory Planning Combining Reinforcement Learning and iLQR
The driving risk field is applicable to more complex driving scenarios, providing new approaches for safety decision-making and active vehicle control in intricate environments. However, existing research often overlooks the driving risk field and fails to consider the impact of risk distribution within drivable areas on trajectory planning, which poses challenges for enhancing safety. This paper proposes a trajectory planning method for intelligent vehicles based on the risk reachable set to further improve the safety of trajectory planning. First, we construct the reachable set incorporating the driving risk field to more accurately assess and avoid potential risks in drivable areas. Then, the initial trajectory is generated based on safe reinforcement learning and projected onto the reachable set. Finally, we introduce a trajectory planning method based on a constrained iterative quadratic regulator to optimize the initial solution, ensuring that the planned trajectory achieves optimal comfort, safety, and efficiency. We conduct simulation tests of trajectory planning in high-speed lane-changing scenarios. The results indicate that the proposed method can guarantee trajectory comfort and driving efficiency, with the generated trajectory situated outside high-risk boundaries, thereby ensuring vehicle safety during operation.
comment: We sincerely request the withdrawal of this paper. After further research and review, we have found that certain parts of the content contain uncertainties and are not sufficient to support the conclusions previously drawn. To avoid any potential misunderstanding or misguidance to the research community, we have decided to voluntarily withdraw the manuscript
Goal-conditioned dual-action imitation learning for dexterous dual-arm robot manipulation
Long-horizon dexterous robot manipulation of deformable objects, such as banana peeling, is a problematic task because of the difficulties in object modeling and a lack of knowledge about stable and dexterous manipulation skills. This paper presents a goal-conditioned dual-action (GC-DA) deep imitation learning (DIL) approach that can learn dexterous manipulation skills using human demonstration data. Previous DIL methods map the current sensory input and reactive action, which often fails because of compounding errors in imitation learning caused by the recurrent computation of actions. The method predicts reactive action only when the precise manipulation of the target object is required (local action) and generates the entire trajectory when precise manipulation is not required (global action). This dual-action formulation effectively prevents compounding error in the imitation learning using the trajectory-based global action while responding to unexpected changes in the target object during the reactive local action. The proposed method was tested in a real dual-arm robot and successfully accomplished the banana-peeling task. Data from this and related works are available at: https://sites.google.com/view/multi-task-fine.
comment: 19 pages, published in Transactions on Robotics (T-RO)
Transformer-based deep imitation learning for dual-arm robot manipulation IROS
Deep imitation learning is promising for solving dexterous manipulation tasks because it does not require an environment model and pre-programmed robot behavior. However, its application to dual-arm manipulation tasks remains challenging. In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions and results in poor performance of the neural networks. We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements. A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world. The proposed method has been tested on dual-arm manipulation tasks using a real robot. The experimental results demonstrated that the Transformer-based deep imitation learning architecture can attend to the important features among the sensory inputs, therefore reducing distractions and improving manipulation performance when compared with the baseline architecture without the self-attention mechanisms. Data from this and related works are available at: https://sites.google.com/view/multi-task-fine.
comment: 8 pages. Accepted in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
VITAL: Interactive Few-Shot Imitation Learning via Visual Human-in-the-Loop Corrections
Imitation Learning (IL) has emerged as a powerful approach in robotics, allowing robots to acquire new skills by mimicking human actions. Despite its potential, the data collection process for IL remains a significant challenge due to the logistical difficulties and high costs associated with obtaining high-quality demonstrations. To address these issues, we propose a large-scale data generation from a handful of demonstrations through data augmentation in simulation. Our approach leverages affordable hardware and visual processing techniques to collect demonstrations, which are then augmented to create extensive training datasets for imitation learning. By utilizing both real and simulated environments, along with human-in-the-loop corrections, we enhance the generalizability and robustness of the learned policies. We evaluated our method through several rounds of experiments in both simulated and real-robot settings, focusing on tasks of varying complexity, including bottle collecting, stacking objects, and hammering. Our experimental results validate the effectiveness of our approach in learning robust robot policies from simulated data, significantly improved by human-in-the-loop corrections and real-world data integration. Additionally, we demonstrate the framework's capability to generalize to new tasks, such as setting a drink tray, showcasing its adaptability and potential for handling a wide range of real-world manipulation tasks. A video of the experiments can be found at: https://youtu.be/YeVAMRqRe64?si=R179xDlEGc7nPu8i
Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation
A high-precision manipulation task, such as needle threading, is challenging. Physiological studies have proposed connecting low-resolution peripheral vision and fast movement to transport the hand into the vicinity of an object, and using high-resolution foveated vision to achieve the accurate homing of the hand to the object. The results of this study demonstrate that a deep imitation learning based method, inspired by the gaze-based dual resolution visuomotor control system in humans, can solve the needle threading task. First, we recorded the gaze movements of a human operator who was teleoperating a robot. Then, we used only a high-resolution image around the gaze to precisely control the thread position when it was close to the target. We used a low-resolution peripheral image to reach the vicinity of the target. The experimental results obtained in this study demonstrate that the proposed method enables precise manipulation tasks using a general-purpose robot manipulator and improves computational efficiency. Data from this and related works are available at: https://sites.google.com/view/multi-task-fine.
comment: 8 pages. The supplementary video can be found at: https://www.youtube.com/watch?v=ytpChcFqD5g Published in IEEE Robotics and Automation Letters. Replaced to add video url in the manuscript
Systems and Control (CS)
Distributionally Robust Planning of Hydrogen-Electrical Microgrids for Sea Islands
This paper presents a distributionally robust planning method for hydrogen-electrical microgrids over islands, where the cross-island energy exchange is supported by a maritime hydrogen transport network. This planning problem is complicated due to heterogeneous off-shore wind-driven uncertainties (i.e., renewable power, transport availability, demand fluctuations, and grid faulting), a subset of which exhibit endogenous uncertainty, as they can be affected by proactive measures (e.g., grid hardening) or infrastructure investment. To capture these features, a two-stage distributionally robust optimization (DRO) model is developed considering decision-dependent uncertainty (DDU), which encompasses variation of the underlying distributional ambiguity due to the change of the first stage decisions. Notably, the complete recourse property is missing, which is often neglected in existing DRO studies. Nevertheless, different from the case for land-based microgrids, this issue is critical and fundamental for sea island systems due to their particular physical and logistical requirements. To address these issues, we develop a C&CG algorithm that is customized with strong cutting planes to handle DRO with a varying DDU ambiguity set and feasibility requirements. Numerical results demonstrate the cost-effectiveness and resilience of the proposed planning framework, along with the nontrivial improvements of the algorithm in both solution accuracy and computational efficiency.
Path Planning Algorithm Comparison Analysis for Wireless AUVs Energy Sharing System
Autonomous underwater vehicles (AUVs) are increasingly used in marine research, military applications, and undersea exploration. However, their operational range is significantly affected by battery performance. In this paper, a framework for a wireless energy sharing system among AUVs is proposed, enabling rapid energy replenishment. Path planning plays a crucial role in the energy-sharing process and autonomous navigation, as it must generate feasible trajectories toward designated goals. This article focuses on efficient obstacle avoidance in complex underwater environments, including irregularly shaped obstacles and narrow passages. The proposed method combines Rapidly-exploring Random Trees Star (RRT*) with Particle Swarm Optimization (PSO) to improve path planning efficiency. Comparative analysis of the two algorithms is presented through simulation results in both random and irregular obstacle environments. Index Terms: Wireless charging, autonomous underwater vehicles (AUVs), path planning, irregular obstacles, narrow passages, RRT*, particle swarm optimization (PSO).
comment: IECON 2023- 49th Annual Conference of the IEEE Industrial Electronics Society
Deep Learning for Continuous-time Stochastic Control with Jumps
In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.
Self-powered smart contact lenses: a multidisciplinary approach to micro-scale energy and 900 MHz - 1.1 GHz bandwidth microfabricated loop antennas communication systems
Smart contact lenses are at the forefront of integrating microelectronics, biomedical engineering, and optics into wearable technologies. This work addresses a key obstacle in their development: achieving autonomous power without compromising safety or miniaturization. We examine energy harvesting strategies using intrinsic ocular sources-particularly tear salinity and eyelid motion-to enable sustainable operation without external batteries. The study emphasizes compact loop antennas operating between 900 MHz and 1.1 GHz as critical for wireless data transmission and power management. Material choices, signal integrity, and biocompatibility are also discussed. By presenting recent advances in 3D-printed optics, antenna integration, and energy systems, we propose a conceptual framework for the next generation of smart lenses, enabling real-time health monitoring and vision enhancement through self-powered, compact devices.
comment: 16 pages, 3 figures
World Models as Reference Trajectories for Rapid Motor Adaptation
Deploying learned control policies in real-world environments poses a fundamental challenge. When system dynamics change unexpectedly, performance degrades until models are retrained on new data. We introduce Reflexive World Models (RWM), a dual control framework that uses world model predictions as implicit reference trajectories for rapid adaptation. Our method separates the control problem into long-term reward maximization through reinforcement learning and robust motor execution through rapid latent control. This dual architecture achieves significantly faster adaptation with low online computational cost compared to model-based RL baselines, while maintaining near-optimal performance. The approach combines the benefits of flexible policy learning through reinforcement learning with rapid error correction capabilities, providing a principled approach to maintaining performance in high-dimensional continuous control tasks under varying dynamics.
Decreasing Utilization of Systems with Multi-Rate Cause-Effect Chains While Reducing End-to-End Latencies
The Logical Execution Time (LET) model has deterministic properties which dramatically reduce the complexity of analyzing temporal requirements of multi-rate cause-effect chains. The configuration (length and position) of task's communication intervals directly define which task instances propagate data through the chain and affect end-to-end latencies. Since not all task instances propagate data through the chain, the execution of these instances wastes processing resources. By manipulating the configuration of communication intervals, it is possible to control which task instances are relevant for data propagation and end-to-end latencies. However, since tasks can belong to more than one cause-effect chain, the problem of configuring communication intervals becomes non-trivial given the large number of possible configurations. In this paper, we present a method to decrease the waste of processing resources while reducing end-to-end latencies. We use a search algorithm to analyze different communication interval configurations and find the combination that best decrease system utilization while reducing end-to-end latencies. By controlling data propagation by means of precedence constraints, our method modifies communication intervals and controls which task instances affect end-to-end latencies. Despite the sporadic release time of some task instances during the analysis, our method transforms those instances into periodic tasks. We evaluate our work using synthetic task sets and the automotive benchmark proposed by BOSCH for the WATERS industrial challenge.
comment: Accepted and to appear in ISORC25 - 28th IEEE International Symposium On Real-Time Distributed Computing. May 2025
SpanTrain: Highly Efficient Cross-domain Model Distributed Training System under Heterogeneous GPUs and Networks in CEE Environment
Most existing training systems focus on a single region. In contrast, we envision that cross-region training offers more flexible GPU resource allocation and yields significant potential. However, the hierarchical cluster topology and unstable networks in the cloud-edge-end (CEE) environment, a typical cross-region scenario, pose substantial challenges to building an efficient and autonomous model training system. We propose SpanTrain, a geo-distributed model training system tailored for heterogeneous GPUs and networks in CEE environments. SpanTrain adopts a communication-centric design philosophy to tackle challenges arising from slow and unstable inter-region networks. It begins with a heterogeneous device profiler that identifies and groups devices based on both network and compute characteristics. Leveraging device groups, SpanTrain implements compact, zero-bubble pipeline parallelism, automatically deriving optimal parallel strategies. To further adapt to runtime variability, SpanTrain integrates a dynamic environment adapter that reacts to network fluctuations. Extensive evaluations demonstrate that SpanTrain achieves 1.3-2.8x higher training throughput compared to widely used and SOTA training systems.
Continuous-time iterative linear-quadratic regulator
We present a continuous-time equivalent to the well-known iterative linear-quadratic algorithm including an implementation of a backtracking line-search policy and a novel regularization approach based on the necessary conditions in the Riccati pass of the linear-quadratic regulator. This allows the algorithm to effectively solve trajectory optimization problems with non-convex cost functions, which is demonstrated on the cart-pole swing-up problem. The algorithm compatibility with state-of-the-art suites of numerical integration solvers allows for the use of high-order adaptive-step methods. Their use results in a variable number of time steps both between passes of the algorithm and across iterations, maintaining a balance between the number of function evaluations and the discretization error.
comment: 6 pages, 3 figures, submitted March 31, 2025, to Decision and Control (CDC 2025)
From learning to safety: A Direct Data-Driven Framework for Constrained Control
Ensuring safety in the sense of constraint satisfaction for learning-based control is a critical challenge, especially in the model-free case. While safety filters address this challenge in the model-based setting by modifying unsafe control inputs, they typically rely on predictive models derived from physics or data. This reliance limits their applicability for advanced model-free learning control methods. To address this gap, we propose a new optimization-based control framework that determines safe control inputs directly from data. The benefit of the framework is that it can be updated through arbitrary model-free learning algorithms to pursue optimal performance. As a key component, the concept of direct data-driven safety filters (3DSF) is first proposed. The framework employs a novel safety certificate, called the state-action control barrier function (SACBF). We present three different schemes to learn the SACBF. Furthermore, based on input-to-state safety analysis, we present the error-to-state safety analysis framework, which provides formal guarantees on safety and recursive feasibility even in the presence of learning inaccuracies. The proposed control framework bridges the gap between model-free learning-based control and constrained control, by decoupling performance optimization from safety enforcement. Simulations on vehicle control illustrate the superior performance regarding constraint satisfaction and task achievement compared to model-based methods and reward shaping.
Certified Neural Approximations of Nonlinear Dynamics
Neural networks hold great potential to act as approximate models of nonlinear dynamical systems, with the resulting neural approximations enabling verification and control of such systems. However, in safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system. To address this fundamental challenge, we propose a novel, adaptive, and parallelizable verification method based on certified first-order models. Our approach provides formal error bounds on the neural approximations of dynamical systems, allowing them to be safely employed as surrogates by interpreting the error bound as bounded disturbances acting on the approximated dynamics. We demonstrate the effectiveness and scalability of our method on a range of established benchmarks from the literature, showing that it outperforms the state-of-the-art. Furthermore, we highlight the flexibility of our framework by applying it to two novel scenarios not previously explored in this context: neural network compression and an autoencoder-based deep learning architecture for learning Koopman operators, both yielding compelling results.
comment: first and second author contributed equally
AI-based Decision Support System for Heritage Aircraft Corrosion Prevention
The paper presents a decision support system for the long-term preservation of aeronautical heritage exhibited/stored in sheltered sites. The aeronautical heritage is characterized by diverse materials of which this heritage is constituted. Heritage aircraft are made of ancient aluminum alloys, (ply)wood, and particularly fabrics. The decision support system (DSS) designed, starting from a conceptual model, is knowledge-based on degradation/corrosion mechanisms of prevailing materials of aeronautical heritage. In the case of historical aircraft wooden parts, this knowledge base is filled in by the damage function models developed within former European projects. Model-based corrosion prediction is implemented within the new DSS for ancient aluminum alloys. The novelty of this DSS consists of supporting multi-material heritage protection and tailoring to peculiarities of aircraft exhibition/storage hangars and the needs of aviation museums. The novel DSS is tested on WWII aircraft heritage exhibited in the Aviation Museum Kbely, Military History Institute Prague, Czech Republic.
comment: 6 pages, 4 figures, 4 tables, submitted January 31, 2025, to Process Control 2025
FAV-NSS: An HIL Framework for Accelerating Validation of Automotive Network Security Strategies
Complex electronic control unit (ECU) architectures, software models and in-vehicle networks are consistently improving safety and comfort functions in modern vehicles. However, the extended functionality and increased connectivity introduce new security risks and vulnerabilities that can be exploited on legacy automotive networks such as the controller area network (CAN). With the rising complexity of vehicular systems and attack vectors, the need for a flexible hardware-in-the-loop (HIL) test fixture that can inject attacks and validate the performance of countermeasures in near-real-world conditions in real time is vital. This paper presents an FPGA-based HIL framework tailored towards validating network security approaches (IDS, IPS) and smart integration strategies of such capabilities for an automotive CAN bus. FAV-NSS replicates an actual vehicular system environment with functional ECUs and network infrastructure on an FPGA, allowing functional validation of IDS/IPS algorithms, accelerator designs and integration schemes (software task on ECU, dedicated accelerator). To show the efficacy of FAV-NSS, we evaluate an IDS accelerator integration problem, both as a traditional coupled accelerator (to the ECU), and secondly close to the CAN controller (mimicking an extended CAN controller). We show that the latter strategy can be fully validated by our framework, which would otherwise require integration of specialised CAN modules into otherwise standard HIL fixtures with ability to instrument internal signals for characterising timing performance. The tests demonstrate a promising latency reduction of 6.3x when compared to the traditional coupled accelerator. Our case study demonstrates the potential of FAV-NSS for accelerating the optimisation, integration and verification of smart ECUs and communication controllers in current and future vehicular systems.
comment: Accepted to 2025 IEEE 36th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Impact of Wind Generation on Risk-based Security Assessment of Power System
The electric power system is one of the largest and most intricate infrastructures. Therefore, it is critical to assess and maintain its security. A power system security assessment is indispensable for identifying post-contingency issues, taking corrective measures, and protecting the system from blackouts. This paper examined the impact of wind generation on the risk-based security assessment of a power transmission network in the context of planning. DIgSILENT PowerFactory software was used to conduct the analysis using a combination of the brute force technique and the nonsequential Monte Carlo (MC) simulation method on the IEEE 39-bus transmission test system. Optimal power flow (OPF) was used to quantify security, considering (N-1), (N-2), and (N-3) line outages and an (N-1) bus outage. Moreover, the average cost deviation from the mean optimal system operating cost was proposed as a novel security indicator. The results obtianed accurately depicted the effects of changing wind generation levels on system security in terms of risk. The most and least critical line(s) and bus in the system, for different wind generation levels, were also determined. Moreover, the worst-case wind-generation threshold level using two different cost functions for wind was identified.
A Risk-Based Probabilistic Transient Stability Approach for Ranking of Circuit Breakers in a Power System
Power systems are getting more complex than ever and are consequently operating close to their limit of stability. Moreover, with the increasing demand of renewable wind generation, and the requirement to maintain a secure power system, the importance of transient stability cannot be overestimated. Current deterministic industry practices of transient stability assessment ignore the probability of variables involved. With increasing system uncertainties and widespread electricity market deregulation, there is a strong inevitability to incorporate probabilistic transient stability analysis. Circuit breakers play a critical role in fault clearing and consequently in determining the system transient stability. It is important that they undergo timely and appropriate maintenance procedures based on some criterion. Considering the need of incorporating risk in modern power systems, this paper proposes a risk-based probabilistic transient stability approach for ranking of circuit breakers in a power system. A novel priority index was proposed to rank the circuit breakers based on the system transient stability risk. DIgSILENT PowerFactory software was used to conduct the required simulations on IEEE 14 bus system. The proposed risk-based framework was deemed to be efficient in identification of the circuit breakers based on their priority rank index which can aid in power system planning process.
Reliable Vertical Federated Learning in 5G Core Network Architecture
This work proposes a new algorithm to mitigate model generalization loss in Vertical Federated Learning (VFL) operating under client reliability constraints within 5G Core Networks (CNs). Recently studied and endorsed by 3GPP, VFL enables collaborative and load-balanced model training and inference across the CN. However, the performance of VFL significantly degrades when the Network Data Analytics Functions (NWDAFs) - which serve as primary clients for VFL model training and inference - experience reliability issues stemming from resource constraints and operational overhead. Unlike edge environments, CN environments adopt fundamentally different data management strategies, characterized by more centralized data orchestration capabilities. This presents opportunities to implement better distributed solutions that take full advantage of the CN data handling flexibility. Leveraging this flexibility, we propose a method that optimizes the vertical feature split among clients while centrally defining their local models based on reliability metrics. Our empirical evaluation demonstrates the effectiveness of our proposed algorithm, showing improved performance over traditional baseline methods.
comment: Globecom Submission
Comparing Parameterizations and Objective Functions for Maximizing the Volume of Zonotopic Invariant Sets
In formal safety verification, many proposed algorithms use parametric set representations and convert the computation of the relevant sets into an optimization problem; consequently, the choice of parameterization and objective function have a significant impact on the efficiency and accuracy of the resulting computation. In particular, recent papers have explored the use of zonotope set representations for various types of invariant sets. In this paper we collect two zonotope parameterizations that are numerically well-behaved and demonstrate that the volume of the corresponding zonotopes is log-concave in the parameters. We then experimentally explore the use of these two parameterizations in an algorithm for computing the maximum volume zonotope invariant under affine dynamics within a specified box constraint over a finite horizon. The true volume of the zonotopes is used as an objective function, along with two alternative heuristics that are faster to compute. We conclude that the heuristics are much faster in practice, although the relative quality of their results declines as the dimension of the problem increases; however, our conclusions are only preliminary due to so-far limited availability of compute resources.
A Neural Network Approach to a Modified Quadratic Boost Multiport Resonant Converter for Electric Vehicle Chargers
This topology can achieve a high step-up gain by utilizing a switched capacitor and switched inductor-based VMC network arrangement.Furthermore, the proposed topology can achieve an output gain of approximately three times at a nominal duty ratio with reduced voltage and current stress across the switch, and enhance the maximum efficiency to 96.7
comment: 17 Pages, 21 figures
Co-optimize condenser water temperature and cooling tower fan using high-fidelity synthetic data
This paper introduces a novel method for optimizing HVAC systems in buildings by integrating a high-fidelity physics-based simulation model with machine learning and measured data. The method enables a real-time building advisory system that provides optimized settings for condenser water loop operation, assisting building operators in decision-making. The building and its HVAC system are first modeled using eQuest. Synthetic data are then generated by running the simulation multiple times. The data are then processed, cleaned, and used to train the machine learning model. The machine learning model enables real-time optimization of the condenser water loop using particle swarm optimization. The results deliver both a real-time online optimizer and an offline operation look-up table, providing optimized condenser water temperature settings and the optimal number of cooling tower fans at a given cooling load. Potential savings are calculated by comparing measured data from two summer months with the energy costs the building would have experienced under optimized settings. Adaptive model refinement is applied to further improve accuracy and effectiveness by utilizing available measured data. The method bridges the gap between simulation and real-time control. It has the potential to be applied to other building systems, including the chilled water loop, heating systems, ventilation systems, and other related processes. Combining physics models, data models, and measured data also enables performance analysis, tracking, and retrofit recommendations.
Green Hacks: Generating Sustainability-Targeting Attacks For Cyber-Physical Systems
Sustainability-targeting attacks (STA) or "Green Hacks" are a growing threat to cyber-physical system (CPS)-based infrastructure, as its performance objectives are increasingly linked to sustainability goals. These attacks exploit the interdependence between control, energy efficiency, and environmental impact to degrade systems' overall performance. Thus, in this work, we propose a general mathematical framework for modeling such STA and derive the feasibility conditions for generating a worst-case STA on a linear CPS using a max-min formulation. A gradient ascent descent algorithm is used to construct the worst-case attack policy. We simulated the worst-case STA for a linear CPS to illustrate its impacts on the CPS performance and sustainability cost.
comment: 10 pages, 3 figures
A Distributed Local Energy Market Clearing Framework Using a Two-Loop ADMM Method
The diversity of prosumers' resources in energy communities can provide significant technical and economic benefits to both prosumers and the distribution system operator (DSO). To maximize these benefits, a coordination framework is required to address all techno-economic constraints as well as the objectives of all agents. This paper presents a fully distributed market-clearing scheme to coordinate the strategies of agents within a local energy community. In the proposed framework, prosumers, the DSO, and the local market operator (LMO) are the participating agents. The framework addresses the preferences and techno-economic constraints of all actors while preserving their privacy. The proposed model is based on a modified alternating direction method of multipliers (ADMM) method with two outer and inner loops; the outer loop models the interactions between the LMO and prosumers, while the inner loop addresses the interactions between the LMO and the DSO. The model is demonstrated on IEEE-69bus test network, showcasing its effectiveness from various perspectives.
comment: This publication is based upon work supported by King Abdullah University of Science and Technology under Award No. RFS-OFP2023-5505. The publication has been accepted at the 2025 IEEE PowerTech conference in Germany
A Quantum-Enhanced Power Flow and Optimal Power Flow based on Combinatorial Reformulation
This study introduces the Adiabatic Quantum Power Flow (AQPF) and Adiabatic Quantum Optimal Power Flow (AQOPF) algorithms to solve power flow (PF) and optimal power flow (OPF) problems, respectively. These algorithms utilize a novel combinatorial optimization reformulation of classical PF and OPF problems, and hence, enable their implementation on Ising machines, e.g., quantum and quantum-inspired hardware. The experiments are conducted on standard test cases ranging from 4-bus to 1354-bus systems, using D-Wave's Advantage system (QA), D-Wave's quantum-classical hybrid solver (HA), Fujitsu's Digital Annealer V3 (DAv3), and Fujitsu's Quantum-Inspired Integrated Optimization software (QIIO). The annealers are systematically evaluated based on: (i) full and partitioned formulations, (ii) ability to handle ill-conditioned cases, and (iii) scalability. The results are benchmarked against the Newton-Raphson numerical method (NR) and suggest that AQPF and AQOPF can serve as effective solvers or complementary tools to classical methods to address unsolved challenges in large-scale modern power systems.
comment: 10 pages, 1 pseudo code, 2 figures, 4 tables
Extremum Seeking for PDE Systems using Physics-Informed Neural Networks
Extremum Seeking (ES) is an effective real-time optimization method for PDE systems in cascade with nonlinear quadratic maps. To address PDEs in the feedback loop, a boundary control law and a re-design of the additive probing signal are mandatory. The latter, commonly called "trajectory generation" or "motion planning," involves designing perturbation signals that anticipate their propagation through PDEs. Specifically, this requires solving motion planning problems for systems governed by parabolic and hyperbolic PDEs. Physics-Informed Neural Networks (PINN) is a powerful tool for solving PDEs by embedding physical laws as constraints in the neural network's loss function, enabling efficient solutions for high-dimensional, nonlinear, and complex problems. This paper proposes a novel construction integrating PINN and ES, automating the motion planning process for specific PDE systems and eliminating the need for case-by-case analytical derivations. The proposed strategy efficiently extracts perturbation signals, optimizing the PDE system.
comment: 23 pages, 16 figures
Gaussian Processes in Power Systems: Techniques, Applications, and Future Works
The increasing integration of renewable energy sources (RESs) and distributed energy resources (DERs) has significantly heightened operational complexity and uncertainty in modern power systems. Concurrently, the widespread deployment of smart meters, phasor measurement units (PMUs) and other sensors has generated vast spatiotemporal data streams, enabling advanced data-driven analytics and decision-making in grid operations. In this context, Gaussian processes (GPs) have emerged as a powerful probabilistic framework, offering uncertainty quantification, non-parametric modeling, and predictive capabilities to enhance power system analysis and control. This paper presents a comprehensive review of GP techniques and their applications in power system operation and control. GP applications are reviewed across three key domains: GP-based modeling, risk assessment, and optimization and control. These areas serve as representative examples of how GP can be utilized in power systems. Furthermore, critical challenges in GP applications are discussed, and potential research directions are outlined to facilitate future power system operations.
Constant-Sum High-Order Barrier Functions for Safety Between Parallel Boundaries
This paper takes a step towards addressing the difficulty of constructing Control Barrier Functions (CBFs) for parallel safety boundaries. A single CBF for both boundaries has been reported to be difficult to validate for safety, and we identify why this challenge is inherent. To overcome this, the proposed method constructs separate CBFs for each boundary. We begin by presenting results for the relative degree one case and then extend these to higher relative degrees using the CBF backstepping technique, establishing conditions that guarantee safety. Finally, we showcase our method by applying it to a unicycle system, deriving a simple, verifiable condition to validate the target CBFs for direct implementation of our results.
comment: Submitted to IEEE L-CSS and 2025 Conference on Decision and Control (CDC)
Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios
In collaborative human-robot environments, the unpredictable and dynamic nature of human motion can lead to situations where collisions become unavoidable. In such cases, it is essential for the robotic system to proactively mitigate potential harm through intelligent control strategies. This paper presents a hierarchical control framework based on Control Barrier Functions (CBFs) designed to ensure safe and adaptive operation of autonomous robotic manipulators during close-proximity human-robot interaction. The proposed method introduces a relaxation variable that enables real-time prioritization of safety constraints, allowing the robot to dynamically manage collision risks based on the criticality of different parts of the human body. A secondary constraint mechanism is incorporated to resolve infeasibility by increasing the priority of imminent threats. The framework is experimentally validated on a Franka Research 3 robot equipped with a ZED2i AI camera for real-time human pose and body detection. Experimental results confirm that the CBF-based controller, integrated with depth sensing, facilitates responsive and safe human-robot collaboration, while providing detailed risk analysis and maintaining robust performance in highly dynamic settings.
Development of a Scaled Setup for Experimental Study of the Effect of Lateral Dynamics on Energy Consumption in Electric Vehicles: An Extension
Most of the existing state-of-the-art approaches for energy consumption analysis do not account for the effect of lateral dynamics on energy consumption in electric vehicles (EVs) during vehicle maneuvers. This paper aims to validate this effect through an experimental study. We develop a scaled model using a radio-controlled (RC) car, modified to achieve dynamic similitude with on-road vehicles, to conduct scaled experiments. The experimental results confirm the impact of lateral dynamics on both energy demand and driving range in electric vehicles, aligning with our previous findings [1], and emphasize the need to incorporate these factors into energy consumption models. This is an extended version of a paper accepted at IEEE ITEC 2025. It includes additional results and analysis.
Probabilistic State Estimation of Timed Probabilistic Discrete Event Systems via Artificial Neural Networks [Draft Version]
This paper is about the state estimation of timed probabilistic discrete event systems. The main contribution is to propose general procedures for developing state estimation approaches based on artificial neural networks. It is assumed that no formal model of the system exists but a data set is available, which contains the history of the timed behaviour of the systems. This dataset will be exploited to develop a neural network model that uses both logical and temporal information gathered during the functioning of the system as inputs and provides the state probability vector as output. Two main approaches are successively proposed (i) state estimation of timed probabilistic discrete event systems over observations: in this case the state estimate is reconstructed at the occurrence of each new observation; (ii) state estimation of timed probabilistic discrete event systems over time: in this case the state estimate is reconstructed at each clock time increment. For each approach, the paper outlines the process of data preprocessing, model building and implementation. This paper not only proposes groundbreaking approaches but also opens the door to further exploitation of artificial neural networks for the benefit of discrete event systems.
Vanishing Stacked-Residual PINN for State Reconstruction of Hyperbolic Systems
In a more connected world, modeling multi-agent systems with hyperbolic partial differential equations (PDEs) offers a compact, physics-consistent description of collective dynamics. However, classical control tools need adaptation for these complex systems. Physics-informed neural networks (PINNs) provide a powerful framework to fix this issue by inferring solutions to PDEs by embedding governing equations into the neural network. A major limitation of original PINNs is their inability to capture steep gradients and discontinuities in hyperbolic PDEs. To tackle this problem, we propose a stacked residual PINN method enhanced with a vanishing viscosity mechanism. Initially, a basic PINN with a small viscosity coefficient provides a stable, low-fidelity solution. Residual correction blocks with learnable scaling parameters then iteratively refine this solution, progressively decreasing the viscosity coefficient to transition from parabolic to hyperbolic PDEs. Applying this method to traffic state reconstruction improved results by an order of magnitude in relative $\mathcal{L}^2$ error, demonstrating its potential to accurately estimate solutions where original PINNs struggle with instability and low fidelity.
Controlling a Social Network of Individuals with Coevolving Actions and Opinions
In this paper, we consider a population of individuals who have actions and opinions, which coevolve, mutually influencing one another on a complex network structure. In particular, we formulate a control problem for this social network, in which we assume that we can inject into the network a committed minority -- a set of stubborn nodes -- with the objective of steering the population, initially at a consensus, to a different consensus state. Our study focuses on two main objectives: i) determining the conditions under which the committed minority succeeds in its goal, and ii) identifying the optimal placement for such a committed minority. After deriving general monotone convergence result for the controlled dynamics, we leverage these results to build a computationally-efficient algorithm to solve the first problem and an effective heuristics for the second problem, which we prove to be NP-complete. For both algorithms, we establish theoretical guarantees. The proposed methodology is illustrated though academic examples, and demonstrated on a real-world case study.
comment: 12 pages, 6 figures. Under Review
Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints
This paper studies the remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the \emph{semantics of information} and consider that the remote actuator has different tolerances for the estimation errors. We aim to find an optimal scheduling policy that minimizes the long-term \textit{state-dependent} costs of estimation errors under a transmission frequency constraint. The optimal scheduling problem is formulated as a \emph{constrained Markov decision process} (CMDP). We show that the optimal Lagrangian cost follows a piece-wise linear and concave (PWLC) function, and the optimal policy is, at most, a randomized mixture of two simple deterministic policies. By exploiting the structural results, we develop a new \textit{intersection search} algorithm that finds the optimal policy using only a few iterations. We further propose a reinforcement learning (RL) algorithm to compute the optimal policy without knowing \textit{a priori} the channel and source statistics. To avoid the ``curse of dimensionality" in MDPs, we propose an online low-complexity \textit{drift-plus-penalty} (DPP) algorithm. Numerical results show that continuous transmission is inefficient, and remarkably, our semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information.
comment: Accepted by the IEEE Transactions on Communications
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers
Modern deep policy gradient methods achieve effective performance on simulated robotic tasks, but they all require large replay buffers or expensive batch updates, or both, making them incompatible for real systems with resource-limited computers. We show that these methods fail catastrophically when limited to small replay buffers or during incremental learning, where updates only use the most recent sample without batch updates or a replay buffer. We propose a novel incremental deep policy gradient method -- Action Value Gradient (AVG) and a set of normalization and scaling techniques to address the challenges of instability in incremental learning. On robotic simulation benchmarks, we show that AVG is the only incremental method that learns effectively, often achieving final performance comparable to batch policy gradient methods. This advancement enabled us to show for the first time effective deep reinforcement learning with real robots using only incremental updates, employing a robotic manipulator and a mobile robot.
comment: In The Thirty-eighth Annual Conference on Neural Information Processing Systems. Source code at https://github.com/gauthamvasan/avg and companion video at https://youtu.be/cwwuN6Hyew0
Learning-Enabled Iterative Convex Optimization for Safety-Critical Model Predictive Control
Safety remains a central challenge in control of dynamical systems, particularly when the boundaries of unsafe sets are complex or unknown. This paper proposes a learning-enabled framework for safety-critical Model Predictive Control (MPC) that integrates Discrete-Time High-Order Control Barrier Functions (DHOCBFs) with iterative convex optimization. Unlike existing methods that primarily address CBFs of relative degree one with fully known unsafe set boundaries, our approach generalizes to arbitrary relative degrees and addresses scenarios where the unsafe set boundaries must be inferred. We extract pixel-based data specifically from unsafe set boundaries and train a neural network to approximate local linearizations of these boundaries. The learned models are incorporated into the linearized DHOCBF constraints at each time step, enabling real-time constraint satisfaction within the MPC framework. An iterative convex optimization procedure is developed to accelerate computation while maintaining formal safety guarantees. The benefits of computational performance and safe avoidance of obstacles with diverse shapes are examined and confirmed through numerical results. By bridging model-based control with learning-based environment modeling, this framework advances safe autonomy for discrete-time systems operating in complex and partially known settings.
comment: 17 pages, 12 figures. arXiv admin note: text overlap with arXiv:2210.04361
Optimal Privacy-Aware Stochastic Sampling
This paper presents a stochastic sampling framework for privacy-aware data sharing, where a sensor observes a process correlated with private information. A sampler determines whether to retain or discard sensor observations, balancing the tradeoff between data utility and privacy. Retained samples are shared with an adversary who may attempt to infer the private process, with privacy leakage quantified using mutual information. The sampler design is formulated as an optimization problem with two objectives: $\left(\romannumeral1\right)$ minimizing the reconstruction error of the observed process using the sampler's output, $\left(\romannumeral2\right)$ reducing the privacy leakages. For a general class of processes, we show that the optimal reconstruction policy is deterministic and derive the optimality conditions for the sampling policy using a dynamic decomposition method, which enables the sampler to control the adversary's belief about private inputs. For linear Gaussian processes, we propose a simplified design by restricting the sampling policy to a specific collection, providing analytical expressions for the reconstruction error, belief state, and sampling objectives based on conditional means and covariances. Additionally, we develop a numerical optimization algorithm to optimize the sampling and reconstruction policies, wherein the policy gradient theorem for the optimal sampling design is derived based on the implicit function theorem. Simulations demonstrate the effectiveness of the proposed method in achieving accurate state reconstruction, privacy protection, and data size reduction.
Reachable Sets-based Trajectory Planning Combining Reinforcement Learning and iLQR
The driving risk field is applicable to more complex driving scenarios, providing new approaches for safety decision-making and active vehicle control in intricate environments. However, existing research often overlooks the driving risk field and fails to consider the impact of risk distribution within drivable areas on trajectory planning, which poses challenges for enhancing safety. This paper proposes a trajectory planning method for intelligent vehicles based on the risk reachable set to further improve the safety of trajectory planning. First, we construct the reachable set incorporating the driving risk field to more accurately assess and avoid potential risks in drivable areas. Then, the initial trajectory is generated based on safe reinforcement learning and projected onto the reachable set. Finally, we introduce a trajectory planning method based on a constrained iterative quadratic regulator to optimize the initial solution, ensuring that the planned trajectory achieves optimal comfort, safety, and efficiency. We conduct simulation tests of trajectory planning in high-speed lane-changing scenarios. The results indicate that the proposed method can guarantee trajectory comfort and driving efficiency, with the generated trajectory situated outside high-risk boundaries, thereby ensuring vehicle safety during operation.
comment: We sincerely request the withdrawal of this paper. After further research and review, we have found that certain parts of the content contain uncertainties and are not sufficient to support the conclusions previously drawn. To avoid any potential misunderstanding or misguidance to the research community, we have decided to voluntarily withdraw the manuscript
Bridging the Sim-to-real Gap: A Control Framework for Imitation Learning of Model Predictive Control
To address the computational challenges of Model Predictive Control (MPC), recent research has studied on using Deep Neural Networks (DNNs) trained through imitation learning to approximate the MPC. However, this introduces a common issue in learning-based control: the simulation-to-reality (sim-to-real) gap. Therefore, Domain Randomization (DR) has been widely used to mitigate this gap by introducing perturbations in the source domain. However, this led to low data collection efficiency and an overly conservative control strategy. This study proposes a new control framework that deals with this issue from a control perspective inspired by Robust Tube MPC. The framework ensures the DNN operates in the same environment as the source domain, handling the sim-to-real gap with great data collection efficiency. Moreover, a parameter governor is introduced to address the DNN's inability to adapt to model parameter variations, enabling the system to satisfy MPC constraints more robustly under changing conditions. The proposed framework was validated through a cart-pole system case study compared by DR baselines, demonstrating that a single MPC-demonstrated trajectory in the source domain was sufficient for controlling the cart-pole in the target domain. Furthermore, the system effectively handled model parameter variations, allowing for a less conservative control.
comment: I withdraw my previously written paper in order to further develop my research and revise its content
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning AAAI 2025
Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.
comment: 50 pages. 5 figures. AAAI 2025 MARW Best Paper Award
Optimal Control of Nonlinear Systems with Unknown Dynamics
This paper presents a data-driven method for finding a closed-loop optimal controller, which minimizes a specified infinite-horizon cost function for systems with unknown dynamics given any arbitrary initial state. Suppose the closed-loop optimal controller can be parameterized by a given class of functions, hereafter referred to as the policy. The proposed method introduces a novel gradient estimation framework, which approximates the gradient of the cost function with respect to the policy parameters via integrating the Koopman operator with the classical concept of actor-critic. This enables the policy parameters to be tuned iteratively using gradient descent to achieve an optimal controller, leveraging the linearity of the Koopman operator. The convergence analysis of the proposed framework is provided. The effectiveness of the method is demonstrated through comparisons with a model-free reinforcement learning approach, and its control performance is further evaluated through simulations against model-based optimal control methods that solve the same optimal control problem utilizing the exact system dynamics.
Deep Koopman Learning using Noisy Data
This paper proposes a data-driven framework to learn a finite-dimensional approximation of a Koopman operator for approximating the state evolution of a dynamical system under noisy observations. To this end, our proposed solution has two main advantages. First, the proposed method only requires the measurement noise to be bounded. Second, the proposed method modifies the existing deep Koopman operator formulations by characterizing the effect of the measurement noise on the Koopman operator learning and then mitigating it by updating the tunable parameter of the observable functions of the Koopman operator, making it easy to implement. The performance of the proposed method is demonstrated on several standard benchmarks. We then compare the presented method with similar methods proposed in the latest literature on Koopman learning.
Systems and Control (EESS)
Distributionally Robust Planning of Hydrogen-Electrical Microgrids for Sea Islands
This paper presents a distributionally robust planning method for hydrogen-electrical microgrids over islands, where the cross-island energy exchange is supported by a maritime hydrogen transport network. This planning problem is complicated due to heterogeneous off-shore wind-driven uncertainties (i.e., renewable power, transport availability, demand fluctuations, and grid faulting), a subset of which exhibit endogenous uncertainty, as they can be affected by proactive measures (e.g., grid hardening) or infrastructure investment. To capture these features, a two-stage distributionally robust optimization (DRO) model is developed considering decision-dependent uncertainty (DDU), which encompasses variation of the underlying distributional ambiguity due to the change of the first stage decisions. Notably, the complete recourse property is missing, which is often neglected in existing DRO studies. Nevertheless, different from the case for land-based microgrids, this issue is critical and fundamental for sea island systems due to their particular physical and logistical requirements. To address these issues, we develop a C&CG algorithm that is customized with strong cutting planes to handle DRO with a varying DDU ambiguity set and feasibility requirements. Numerical results demonstrate the cost-effectiveness and resilience of the proposed planning framework, along with the nontrivial improvements of the algorithm in both solution accuracy and computational efficiency.
Path Planning Algorithm Comparison Analysis for Wireless AUVs Energy Sharing System
Autonomous underwater vehicles (AUVs) are increasingly used in marine research, military applications, and undersea exploration. However, their operational range is significantly affected by battery performance. In this paper, a framework for a wireless energy sharing system among AUVs is proposed, enabling rapid energy replenishment. Path planning plays a crucial role in the energy-sharing process and autonomous navigation, as it must generate feasible trajectories toward designated goals. This article focuses on efficient obstacle avoidance in complex underwater environments, including irregularly shaped obstacles and narrow passages. The proposed method combines Rapidly-exploring Random Trees Star (RRT*) with Particle Swarm Optimization (PSO) to improve path planning efficiency. Comparative analysis of the two algorithms is presented through simulation results in both random and irregular obstacle environments. Index Terms: Wireless charging, autonomous underwater vehicles (AUVs), path planning, irregular obstacles, narrow passages, RRT*, particle swarm optimization (PSO).
comment: IECON 2023- 49th Annual Conference of the IEEE Industrial Electronics Society
Deep Learning for Continuous-time Stochastic Control with Jumps
In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.
Self-powered smart contact lenses: a multidisciplinary approach to micro-scale energy and 900 MHz - 1.1 GHz bandwidth microfabricated loop antennas communication systems
Smart contact lenses are at the forefront of integrating microelectronics, biomedical engineering, and optics into wearable technologies. This work addresses a key obstacle in their development: achieving autonomous power without compromising safety or miniaturization. We examine energy harvesting strategies using intrinsic ocular sources-particularly tear salinity and eyelid motion-to enable sustainable operation without external batteries. The study emphasizes compact loop antennas operating between 900 MHz and 1.1 GHz as critical for wireless data transmission and power management. Material choices, signal integrity, and biocompatibility are also discussed. By presenting recent advances in 3D-printed optics, antenna integration, and energy systems, we propose a conceptual framework for the next generation of smart lenses, enabling real-time health monitoring and vision enhancement through self-powered, compact devices.
comment: 16 pages, 3 figures
World Models as Reference Trajectories for Rapid Motor Adaptation
Deploying learned control policies in real-world environments poses a fundamental challenge. When system dynamics change unexpectedly, performance degrades until models are retrained on new data. We introduce Reflexive World Models (RWM), a dual control framework that uses world model predictions as implicit reference trajectories for rapid adaptation. Our method separates the control problem into long-term reward maximization through reinforcement learning and robust motor execution through rapid latent control. This dual architecture achieves significantly faster adaptation with low online computational cost compared to model-based RL baselines, while maintaining near-optimal performance. The approach combines the benefits of flexible policy learning through reinforcement learning with rapid error correction capabilities, providing a principled approach to maintaining performance in high-dimensional continuous control tasks under varying dynamics.
Decreasing Utilization of Systems with Multi-Rate Cause-Effect Chains While Reducing End-to-End Latencies
The Logical Execution Time (LET) model has deterministic properties which dramatically reduce the complexity of analyzing temporal requirements of multi-rate cause-effect chains. The configuration (length and position) of task's communication intervals directly define which task instances propagate data through the chain and affect end-to-end latencies. Since not all task instances propagate data through the chain, the execution of these instances wastes processing resources. By manipulating the configuration of communication intervals, it is possible to control which task instances are relevant for data propagation and end-to-end latencies. However, since tasks can belong to more than one cause-effect chain, the problem of configuring communication intervals becomes non-trivial given the large number of possible configurations. In this paper, we present a method to decrease the waste of processing resources while reducing end-to-end latencies. We use a search algorithm to analyze different communication interval configurations and find the combination that best decrease system utilization while reducing end-to-end latencies. By controlling data propagation by means of precedence constraints, our method modifies communication intervals and controls which task instances affect end-to-end latencies. Despite the sporadic release time of some task instances during the analysis, our method transforms those instances into periodic tasks. We evaluate our work using synthetic task sets and the automotive benchmark proposed by BOSCH for the WATERS industrial challenge.
comment: Accepted and to appear in ISORC25 - 28th IEEE International Symposium On Real-Time Distributed Computing. May 2025
SpanTrain: Highly Efficient Cross-domain Model Distributed Training System under Heterogeneous GPUs and Networks in CEE Environment
Most existing training systems focus on a single region. In contrast, we envision that cross-region training offers more flexible GPU resource allocation and yields significant potential. However, the hierarchical cluster topology and unstable networks in the cloud-edge-end (CEE) environment, a typical cross-region scenario, pose substantial challenges to building an efficient and autonomous model training system. We propose SpanTrain, a geo-distributed model training system tailored for heterogeneous GPUs and networks in CEE environments. SpanTrain adopts a communication-centric design philosophy to tackle challenges arising from slow and unstable inter-region networks. It begins with a heterogeneous device profiler that identifies and groups devices based on both network and compute characteristics. Leveraging device groups, SpanTrain implements compact, zero-bubble pipeline parallelism, automatically deriving optimal parallel strategies. To further adapt to runtime variability, SpanTrain integrates a dynamic environment adapter that reacts to network fluctuations. Extensive evaluations demonstrate that SpanTrain achieves 1.3-2.8x higher training throughput compared to widely used and SOTA training systems.
Continuous-time iterative linear-quadratic regulator
We present a continuous-time equivalent to the well-known iterative linear-quadratic algorithm including an implementation of a backtracking line-search policy and a novel regularization approach based on the necessary conditions in the Riccati pass of the linear-quadratic regulator. This allows the algorithm to effectively solve trajectory optimization problems with non-convex cost functions, which is demonstrated on the cart-pole swing-up problem. The algorithm compatibility with state-of-the-art suites of numerical integration solvers allows for the use of high-order adaptive-step methods. Their use results in a variable number of time steps both between passes of the algorithm and across iterations, maintaining a balance between the number of function evaluations and the discretization error.
comment: 6 pages, 3 figures, submitted March 31, 2025, to Decision and Control (CDC 2025)
From learning to safety: A Direct Data-Driven Framework for Constrained Control
Ensuring safety in the sense of constraint satisfaction for learning-based control is a critical challenge, especially in the model-free case. While safety filters address this challenge in the model-based setting by modifying unsafe control inputs, they typically rely on predictive models derived from physics or data. This reliance limits their applicability for advanced model-free learning control methods. To address this gap, we propose a new optimization-based control framework that determines safe control inputs directly from data. The benefit of the framework is that it can be updated through arbitrary model-free learning algorithms to pursue optimal performance. As a key component, the concept of direct data-driven safety filters (3DSF) is first proposed. The framework employs a novel safety certificate, called the state-action control barrier function (SACBF). We present three different schemes to learn the SACBF. Furthermore, based on input-to-state safety analysis, we present the error-to-state safety analysis framework, which provides formal guarantees on safety and recursive feasibility even in the presence of learning inaccuracies. The proposed control framework bridges the gap between model-free learning-based control and constrained control, by decoupling performance optimization from safety enforcement. Simulations on vehicle control illustrate the superior performance regarding constraint satisfaction and task achievement compared to model-based methods and reward shaping.
Certified Neural Approximations of Nonlinear Dynamics
Neural networks hold great potential to act as approximate models of nonlinear dynamical systems, with the resulting neural approximations enabling verification and control of such systems. However, in safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system. To address this fundamental challenge, we propose a novel, adaptive, and parallelizable verification method based on certified first-order models. Our approach provides formal error bounds on the neural approximations of dynamical systems, allowing them to be safely employed as surrogates by interpreting the error bound as bounded disturbances acting on the approximated dynamics. We demonstrate the effectiveness and scalability of our method on a range of established benchmarks from the literature, showing that it outperforms the state-of-the-art. Furthermore, we highlight the flexibility of our framework by applying it to two novel scenarios not previously explored in this context: neural network compression and an autoencoder-based deep learning architecture for learning Koopman operators, both yielding compelling results.
comment: first and second author contributed equally
AI-based Decision Support System for Heritage Aircraft Corrosion Prevention
The paper presents a decision support system for the long-term preservation of aeronautical heritage exhibited/stored in sheltered sites. The aeronautical heritage is characterized by diverse materials of which this heritage is constituted. Heritage aircraft are made of ancient aluminum alloys, (ply)wood, and particularly fabrics. The decision support system (DSS) designed, starting from a conceptual model, is knowledge-based on degradation/corrosion mechanisms of prevailing materials of aeronautical heritage. In the case of historical aircraft wooden parts, this knowledge base is filled in by the damage function models developed within former European projects. Model-based corrosion prediction is implemented within the new DSS for ancient aluminum alloys. The novelty of this DSS consists of supporting multi-material heritage protection and tailoring to peculiarities of aircraft exhibition/storage hangars and the needs of aviation museums. The novel DSS is tested on WWII aircraft heritage exhibited in the Aviation Museum Kbely, Military History Institute Prague, Czech Republic.
comment: 6 pages, 4 figures, 4 tables, submitted January 31, 2025, to Process Control 2025
FAV-NSS: An HIL Framework for Accelerating Validation of Automotive Network Security Strategies
Complex electronic control unit (ECU) architectures, software models and in-vehicle networks are consistently improving safety and comfort functions in modern vehicles. However, the extended functionality and increased connectivity introduce new security risks and vulnerabilities that can be exploited on legacy automotive networks such as the controller area network (CAN). With the rising complexity of vehicular systems and attack vectors, the need for a flexible hardware-in-the-loop (HIL) test fixture that can inject attacks and validate the performance of countermeasures in near-real-world conditions in real time is vital. This paper presents an FPGA-based HIL framework tailored towards validating network security approaches (IDS, IPS) and smart integration strategies of such capabilities for an automotive CAN bus. FAV-NSS replicates an actual vehicular system environment with functional ECUs and network infrastructure on an FPGA, allowing functional validation of IDS/IPS algorithms, accelerator designs and integration schemes (software task on ECU, dedicated accelerator). To show the efficacy of FAV-NSS, we evaluate an IDS accelerator integration problem, both as a traditional coupled accelerator (to the ECU), and secondly close to the CAN controller (mimicking an extended CAN controller). We show that the latter strategy can be fully validated by our framework, which would otherwise require integration of specialised CAN modules into otherwise standard HIL fixtures with ability to instrument internal signals for characterising timing performance. The tests demonstrate a promising latency reduction of 6.3x when compared to the traditional coupled accelerator. Our case study demonstrates the potential of FAV-NSS for accelerating the optimisation, integration and verification of smart ECUs and communication controllers in current and future vehicular systems.
comment: Accepted to 2025 IEEE 36th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Impact of Wind Generation on Risk-based Security Assessment of Power System
The electric power system is one of the largest and most intricate infrastructures. Therefore, it is critical to assess and maintain its security. A power system security assessment is indispensable for identifying post-contingency issues, taking corrective measures, and protecting the system from blackouts. This paper examined the impact of wind generation on the risk-based security assessment of a power transmission network in the context of planning. DIgSILENT PowerFactory software was used to conduct the analysis using a combination of the brute force technique and the nonsequential Monte Carlo (MC) simulation method on the IEEE 39-bus transmission test system. Optimal power flow (OPF) was used to quantify security, considering (N-1), (N-2), and (N-3) line outages and an (N-1) bus outage. Moreover, the average cost deviation from the mean optimal system operating cost was proposed as a novel security indicator. The results obtianed accurately depicted the effects of changing wind generation levels on system security in terms of risk. The most and least critical line(s) and bus in the system, for different wind generation levels, were also determined. Moreover, the worst-case wind-generation threshold level using two different cost functions for wind was identified.
A Risk-Based Probabilistic Transient Stability Approach for Ranking of Circuit Breakers in a Power System
Power systems are getting more complex than ever and are consequently operating close to their limit of stability. Moreover, with the increasing demand of renewable wind generation, and the requirement to maintain a secure power system, the importance of transient stability cannot be overestimated. Current deterministic industry practices of transient stability assessment ignore the probability of variables involved. With increasing system uncertainties and widespread electricity market deregulation, there is a strong inevitability to incorporate probabilistic transient stability analysis. Circuit breakers play a critical role in fault clearing and consequently in determining the system transient stability. It is important that they undergo timely and appropriate maintenance procedures based on some criterion. Considering the need of incorporating risk in modern power systems, this paper proposes a risk-based probabilistic transient stability approach for ranking of circuit breakers in a power system. A novel priority index was proposed to rank the circuit breakers based on the system transient stability risk. DIgSILENT PowerFactory software was used to conduct the required simulations on IEEE 14 bus system. The proposed risk-based framework was deemed to be efficient in identification of the circuit breakers based on their priority rank index which can aid in power system planning process.
Reliable Vertical Federated Learning in 5G Core Network Architecture
This work proposes a new algorithm to mitigate model generalization loss in Vertical Federated Learning (VFL) operating under client reliability constraints within 5G Core Networks (CNs). Recently studied and endorsed by 3GPP, VFL enables collaborative and load-balanced model training and inference across the CN. However, the performance of VFL significantly degrades when the Network Data Analytics Functions (NWDAFs) - which serve as primary clients for VFL model training and inference - experience reliability issues stemming from resource constraints and operational overhead. Unlike edge environments, CN environments adopt fundamentally different data management strategies, characterized by more centralized data orchestration capabilities. This presents opportunities to implement better distributed solutions that take full advantage of the CN data handling flexibility. Leveraging this flexibility, we propose a method that optimizes the vertical feature split among clients while centrally defining their local models based on reliability metrics. Our empirical evaluation demonstrates the effectiveness of our proposed algorithm, showing improved performance over traditional baseline methods.
comment: Globecom Submission
Comparing Parameterizations and Objective Functions for Maximizing the Volume of Zonotopic Invariant Sets
In formal safety verification, many proposed algorithms use parametric set representations and convert the computation of the relevant sets into an optimization problem; consequently, the choice of parameterization and objective function have a significant impact on the efficiency and accuracy of the resulting computation. In particular, recent papers have explored the use of zonotope set representations for various types of invariant sets. In this paper we collect two zonotope parameterizations that are numerically well-behaved and demonstrate that the volume of the corresponding zonotopes is log-concave in the parameters. We then experimentally explore the use of these two parameterizations in an algorithm for computing the maximum volume zonotope invariant under affine dynamics within a specified box constraint over a finite horizon. The true volume of the zonotopes is used as an objective function, along with two alternative heuristics that are faster to compute. We conclude that the heuristics are much faster in practice, although the relative quality of their results declines as the dimension of the problem increases; however, our conclusions are only preliminary due to so-far limited availability of compute resources.
A Neural Network Approach to a Modified Quadratic Boost Multiport Resonant Converter for Electric Vehicle Chargers
This topology can achieve a high step-up gain by utilizing a switched capacitor and switched inductor-based VMC network arrangement.Furthermore, the proposed topology can achieve an output gain of approximately three times at a nominal duty ratio with reduced voltage and current stress across the switch, and enhance the maximum efficiency to 96.7
comment: 17 Pages, 21 figures
Co-optimize condenser water temperature and cooling tower fan using high-fidelity synthetic data
This paper introduces a novel method for optimizing HVAC systems in buildings by integrating a high-fidelity physics-based simulation model with machine learning and measured data. The method enables a real-time building advisory system that provides optimized settings for condenser water loop operation, assisting building operators in decision-making. The building and its HVAC system are first modeled using eQuest. Synthetic data are then generated by running the simulation multiple times. The data are then processed, cleaned, and used to train the machine learning model. The machine learning model enables real-time optimization of the condenser water loop using particle swarm optimization. The results deliver both a real-time online optimizer and an offline operation look-up table, providing optimized condenser water temperature settings and the optimal number of cooling tower fans at a given cooling load. Potential savings are calculated by comparing measured data from two summer months with the energy costs the building would have experienced under optimized settings. Adaptive model refinement is applied to further improve accuracy and effectiveness by utilizing available measured data. The method bridges the gap between simulation and real-time control. It has the potential to be applied to other building systems, including the chilled water loop, heating systems, ventilation systems, and other related processes. Combining physics models, data models, and measured data also enables performance analysis, tracking, and retrofit recommendations.
Green Hacks: Generating Sustainability-Targeting Attacks For Cyber-Physical Systems
Sustainability-targeting attacks (STA) or "Green Hacks" are a growing threat to cyber-physical system (CPS)-based infrastructure, as its performance objectives are increasingly linked to sustainability goals. These attacks exploit the interdependence between control, energy efficiency, and environmental impact to degrade systems' overall performance. Thus, in this work, we propose a general mathematical framework for modeling such STA and derive the feasibility conditions for generating a worst-case STA on a linear CPS using a max-min formulation. A gradient ascent descent algorithm is used to construct the worst-case attack policy. We simulated the worst-case STA for a linear CPS to illustrate its impacts on the CPS performance and sustainability cost.
comment: 10 pages, 3 figures
A Distributed Local Energy Market Clearing Framework Using a Two-Loop ADMM Method
The diversity of prosumers' resources in energy communities can provide significant technical and economic benefits to both prosumers and the distribution system operator (DSO). To maximize these benefits, a coordination framework is required to address all techno-economic constraints as well as the objectives of all agents. This paper presents a fully distributed market-clearing scheme to coordinate the strategies of agents within a local energy community. In the proposed framework, prosumers, the DSO, and the local market operator (LMO) are the participating agents. The framework addresses the preferences and techno-economic constraints of all actors while preserving their privacy. The proposed model is based on a modified alternating direction method of multipliers (ADMM) method with two outer and inner loops; the outer loop models the interactions between the LMO and prosumers, while the inner loop addresses the interactions between the LMO and the DSO. The model is demonstrated on IEEE-69bus test network, showcasing its effectiveness from various perspectives.
comment: This publication is based upon work supported by King Abdullah University of Science and Technology under Award No. RFS-OFP2023-5505. The publication has been accepted at the 2025 IEEE PowerTech conference in Germany
A Quantum-Enhanced Power Flow and Optimal Power Flow based on Combinatorial Reformulation
This study introduces the Adiabatic Quantum Power Flow (AQPF) and Adiabatic Quantum Optimal Power Flow (AQOPF) algorithms to solve power flow (PF) and optimal power flow (OPF) problems, respectively. These algorithms utilize a novel combinatorial optimization reformulation of classical PF and OPF problems, and hence, enable their implementation on Ising machines, e.g., quantum and quantum-inspired hardware. The experiments are conducted on standard test cases ranging from 4-bus to 1354-bus systems, using D-Wave's Advantage system (QA), D-Wave's quantum-classical hybrid solver (HA), Fujitsu's Digital Annealer V3 (DAv3), and Fujitsu's Quantum-Inspired Integrated Optimization software (QIIO). The annealers are systematically evaluated based on: (i) full and partitioned formulations, (ii) ability to handle ill-conditioned cases, and (iii) scalability. The results are benchmarked against the Newton-Raphson numerical method (NR) and suggest that AQPF and AQOPF can serve as effective solvers or complementary tools to classical methods to address unsolved challenges in large-scale modern power systems.
comment: 10 pages, 1 pseudo code, 2 figures, 4 tables
Extremum Seeking for PDE Systems using Physics-Informed Neural Networks
Extremum Seeking (ES) is an effective real-time optimization method for PDE systems in cascade with nonlinear quadratic maps. To address PDEs in the feedback loop, a boundary control law and a re-design of the additive probing signal are mandatory. The latter, commonly called "trajectory generation" or "motion planning," involves designing perturbation signals that anticipate their propagation through PDEs. Specifically, this requires solving motion planning problems for systems governed by parabolic and hyperbolic PDEs. Physics-Informed Neural Networks (PINN) is a powerful tool for solving PDEs by embedding physical laws as constraints in the neural network's loss function, enabling efficient solutions for high-dimensional, nonlinear, and complex problems. This paper proposes a novel construction integrating PINN and ES, automating the motion planning process for specific PDE systems and eliminating the need for case-by-case analytical derivations. The proposed strategy efficiently extracts perturbation signals, optimizing the PDE system.
comment: 23 pages, 16 figures
Gaussian Processes in Power Systems: Techniques, Applications, and Future Works
The increasing integration of renewable energy sources (RESs) and distributed energy resources (DERs) has significantly heightened operational complexity and uncertainty in modern power systems. Concurrently, the widespread deployment of smart meters, phasor measurement units (PMUs) and other sensors has generated vast spatiotemporal data streams, enabling advanced data-driven analytics and decision-making in grid operations. In this context, Gaussian processes (GPs) have emerged as a powerful probabilistic framework, offering uncertainty quantification, non-parametric modeling, and predictive capabilities to enhance power system analysis and control. This paper presents a comprehensive review of GP techniques and their applications in power system operation and control. GP applications are reviewed across three key domains: GP-based modeling, risk assessment, and optimization and control. These areas serve as representative examples of how GP can be utilized in power systems. Furthermore, critical challenges in GP applications are discussed, and potential research directions are outlined to facilitate future power system operations.
Constant-Sum High-Order Barrier Functions for Safety Between Parallel Boundaries
This paper takes a step towards addressing the difficulty of constructing Control Barrier Functions (CBFs) for parallel safety boundaries. A single CBF for both boundaries has been reported to be difficult to validate for safety, and we identify why this challenge is inherent. To overcome this, the proposed method constructs separate CBFs for each boundary. We begin by presenting results for the relative degree one case and then extend these to higher relative degrees using the CBF backstepping technique, establishing conditions that guarantee safety. Finally, we showcase our method by applying it to a unicycle system, deriving a simple, verifiable condition to validate the target CBFs for direct implementation of our results.
comment: Submitted to IEEE L-CSS and 2025 Conference on Decision and Control (CDC)
Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios
In collaborative human-robot environments, the unpredictable and dynamic nature of human motion can lead to situations where collisions become unavoidable. In such cases, it is essential for the robotic system to proactively mitigate potential harm through intelligent control strategies. This paper presents a hierarchical control framework based on Control Barrier Functions (CBFs) designed to ensure safe and adaptive operation of autonomous robotic manipulators during close-proximity human-robot interaction. The proposed method introduces a relaxation variable that enables real-time prioritization of safety constraints, allowing the robot to dynamically manage collision risks based on the criticality of different parts of the human body. A secondary constraint mechanism is incorporated to resolve infeasibility by increasing the priority of imminent threats. The framework is experimentally validated on a Franka Research 3 robot equipped with a ZED2i AI camera for real-time human pose and body detection. Experimental results confirm that the CBF-based controller, integrated with depth sensing, facilitates responsive and safe human-robot collaboration, while providing detailed risk analysis and maintaining robust performance in highly dynamic settings.
Development of a Scaled Setup for Experimental Study of the Effect of Lateral Dynamics on Energy Consumption in Electric Vehicles: An Extension
Most of the existing state-of-the-art approaches for energy consumption analysis do not account for the effect of lateral dynamics on energy consumption in electric vehicles (EVs) during vehicle maneuvers. This paper aims to validate this effect through an experimental study. We develop a scaled model using a radio-controlled (RC) car, modified to achieve dynamic similitude with on-road vehicles, to conduct scaled experiments. The experimental results confirm the impact of lateral dynamics on both energy demand and driving range in electric vehicles, aligning with our previous findings [1], and emphasize the need to incorporate these factors into energy consumption models. This is an extended version of a paper accepted at IEEE ITEC 2025. It includes additional results and analysis.
Probabilistic State Estimation of Timed Probabilistic Discrete Event Systems via Artificial Neural Networks [Draft Version]
This paper is about the state estimation of timed probabilistic discrete event systems. The main contribution is to propose general procedures for developing state estimation approaches based on artificial neural networks. It is assumed that no formal model of the system exists but a data set is available, which contains the history of the timed behaviour of the systems. This dataset will be exploited to develop a neural network model that uses both logical and temporal information gathered during the functioning of the system as inputs and provides the state probability vector as output. Two main approaches are successively proposed (i) state estimation of timed probabilistic discrete event systems over observations: in this case the state estimate is reconstructed at the occurrence of each new observation; (ii) state estimation of timed probabilistic discrete event systems over time: in this case the state estimate is reconstructed at each clock time increment. For each approach, the paper outlines the process of data preprocessing, model building and implementation. This paper not only proposes groundbreaking approaches but also opens the door to further exploitation of artificial neural networks for the benefit of discrete event systems.
Vanishing Stacked-Residual PINN for State Reconstruction of Hyperbolic Systems
In a more connected world, modeling multi-agent systems with hyperbolic partial differential equations (PDEs) offers a compact, physics-consistent description of collective dynamics. However, classical control tools need adaptation for these complex systems. Physics-informed neural networks (PINNs) provide a powerful framework to fix this issue by inferring solutions to PDEs by embedding governing equations into the neural network. A major limitation of original PINNs is their inability to capture steep gradients and discontinuities in hyperbolic PDEs. To tackle this problem, we propose a stacked residual PINN method enhanced with a vanishing viscosity mechanism. Initially, a basic PINN with a small viscosity coefficient provides a stable, low-fidelity solution. Residual correction blocks with learnable scaling parameters then iteratively refine this solution, progressively decreasing the viscosity coefficient to transition from parabolic to hyperbolic PDEs. Applying this method to traffic state reconstruction improved results by an order of magnitude in relative $\mathcal{L}^2$ error, demonstrating its potential to accurately estimate solutions where original PINNs struggle with instability and low fidelity.
Controlling a Social Network of Individuals with Coevolving Actions and Opinions
In this paper, we consider a population of individuals who have actions and opinions, which coevolve, mutually influencing one another on a complex network structure. In particular, we formulate a control problem for this social network, in which we assume that we can inject into the network a committed minority -- a set of stubborn nodes -- with the objective of steering the population, initially at a consensus, to a different consensus state. Our study focuses on two main objectives: i) determining the conditions under which the committed minority succeeds in its goal, and ii) identifying the optimal placement for such a committed minority. After deriving general monotone convergence result for the controlled dynamics, we leverage these results to build a computationally-efficient algorithm to solve the first problem and an effective heuristics for the second problem, which we prove to be NP-complete. For both algorithms, we establish theoretical guarantees. The proposed methodology is illustrated though academic examples, and demonstrated on a real-world case study.
comment: 12 pages, 6 figures. Under Review
Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints
This paper studies the remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the \emph{semantics of information} and consider that the remote actuator has different tolerances for the estimation errors. We aim to find an optimal scheduling policy that minimizes the long-term \textit{state-dependent} costs of estimation errors under a transmission frequency constraint. The optimal scheduling problem is formulated as a \emph{constrained Markov decision process} (CMDP). We show that the optimal Lagrangian cost follows a piece-wise linear and concave (PWLC) function, and the optimal policy is, at most, a randomized mixture of two simple deterministic policies. By exploiting the structural results, we develop a new \textit{intersection search} algorithm that finds the optimal policy using only a few iterations. We further propose a reinforcement learning (RL) algorithm to compute the optimal policy without knowing \textit{a priori} the channel and source statistics. To avoid the ``curse of dimensionality" in MDPs, we propose an online low-complexity \textit{drift-plus-penalty} (DPP) algorithm. Numerical results show that continuous transmission is inefficient, and remarkably, our semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information.
comment: Accepted by the IEEE Transactions on Communications
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers
Modern deep policy gradient methods achieve effective performance on simulated robotic tasks, but they all require large replay buffers or expensive batch updates, or both, making them incompatible for real systems with resource-limited computers. We show that these methods fail catastrophically when limited to small replay buffers or during incremental learning, where updates only use the most recent sample without batch updates or a replay buffer. We propose a novel incremental deep policy gradient method -- Action Value Gradient (AVG) and a set of normalization and scaling techniques to address the challenges of instability in incremental learning. On robotic simulation benchmarks, we show that AVG is the only incremental method that learns effectively, often achieving final performance comparable to batch policy gradient methods. This advancement enabled us to show for the first time effective deep reinforcement learning with real robots using only incremental updates, employing a robotic manipulator and a mobile robot.
comment: In The Thirty-eighth Annual Conference on Neural Information Processing Systems. Source code at https://github.com/gauthamvasan/avg and companion video at https://youtu.be/cwwuN6Hyew0
Learning-Enabled Iterative Convex Optimization for Safety-Critical Model Predictive Control
Safety remains a central challenge in control of dynamical systems, particularly when the boundaries of unsafe sets are complex or unknown. This paper proposes a learning-enabled framework for safety-critical Model Predictive Control (MPC) that integrates Discrete-Time High-Order Control Barrier Functions (DHOCBFs) with iterative convex optimization. Unlike existing methods that primarily address CBFs of relative degree one with fully known unsafe set boundaries, our approach generalizes to arbitrary relative degrees and addresses scenarios where the unsafe set boundaries must be inferred. We extract pixel-based data specifically from unsafe set boundaries and train a neural network to approximate local linearizations of these boundaries. The learned models are incorporated into the linearized DHOCBF constraints at each time step, enabling real-time constraint satisfaction within the MPC framework. An iterative convex optimization procedure is developed to accelerate computation while maintaining formal safety guarantees. The benefits of computational performance and safe avoidance of obstacles with diverse shapes are examined and confirmed through numerical results. By bridging model-based control with learning-based environment modeling, this framework advances safe autonomy for discrete-time systems operating in complex and partially known settings.
comment: 17 pages, 12 figures. arXiv admin note: text overlap with arXiv:2210.04361
Optimal Privacy-Aware Stochastic Sampling
This paper presents a stochastic sampling framework for privacy-aware data sharing, where a sensor observes a process correlated with private information. A sampler determines whether to retain or discard sensor observations, balancing the tradeoff between data utility and privacy. Retained samples are shared with an adversary who may attempt to infer the private process, with privacy leakage quantified using mutual information. The sampler design is formulated as an optimization problem with two objectives: $\left(\romannumeral1\right)$ minimizing the reconstruction error of the observed process using the sampler's output, $\left(\romannumeral2\right)$ reducing the privacy leakages. For a general class of processes, we show that the optimal reconstruction policy is deterministic and derive the optimality conditions for the sampling policy using a dynamic decomposition method, which enables the sampler to control the adversary's belief about private inputs. For linear Gaussian processes, we propose a simplified design by restricting the sampling policy to a specific collection, providing analytical expressions for the reconstruction error, belief state, and sampling objectives based on conditional means and covariances. Additionally, we develop a numerical optimization algorithm to optimize the sampling and reconstruction policies, wherein the policy gradient theorem for the optimal sampling design is derived based on the implicit function theorem. Simulations demonstrate the effectiveness of the proposed method in achieving accurate state reconstruction, privacy protection, and data size reduction.
Reachable Sets-based Trajectory Planning Combining Reinforcement Learning and iLQR
The driving risk field is applicable to more complex driving scenarios, providing new approaches for safety decision-making and active vehicle control in intricate environments. However, existing research often overlooks the driving risk field and fails to consider the impact of risk distribution within drivable areas on trajectory planning, which poses challenges for enhancing safety. This paper proposes a trajectory planning method for intelligent vehicles based on the risk reachable set to further improve the safety of trajectory planning. First, we construct the reachable set incorporating the driving risk field to more accurately assess and avoid potential risks in drivable areas. Then, the initial trajectory is generated based on safe reinforcement learning and projected onto the reachable set. Finally, we introduce a trajectory planning method based on a constrained iterative quadratic regulator to optimize the initial solution, ensuring that the planned trajectory achieves optimal comfort, safety, and efficiency. We conduct simulation tests of trajectory planning in high-speed lane-changing scenarios. The results indicate that the proposed method can guarantee trajectory comfort and driving efficiency, with the generated trajectory situated outside high-risk boundaries, thereby ensuring vehicle safety during operation.
comment: We sincerely request the withdrawal of this paper. After further research and review, we have found that certain parts of the content contain uncertainties and are not sufficient to support the conclusions previously drawn. To avoid any potential misunderstanding or misguidance to the research community, we have decided to voluntarily withdraw the manuscript
Bridging the Sim-to-real Gap: A Control Framework for Imitation Learning of Model Predictive Control
To address the computational challenges of Model Predictive Control (MPC), recent research has studied on using Deep Neural Networks (DNNs) trained through imitation learning to approximate the MPC. However, this introduces a common issue in learning-based control: the simulation-to-reality (sim-to-real) gap. Therefore, Domain Randomization (DR) has been widely used to mitigate this gap by introducing perturbations in the source domain. However, this led to low data collection efficiency and an overly conservative control strategy. This study proposes a new control framework that deals with this issue from a control perspective inspired by Robust Tube MPC. The framework ensures the DNN operates in the same environment as the source domain, handling the sim-to-real gap with great data collection efficiency. Moreover, a parameter governor is introduced to address the DNN's inability to adapt to model parameter variations, enabling the system to satisfy MPC constraints more robustly under changing conditions. The proposed framework was validated through a cart-pole system case study compared by DR baselines, demonstrating that a single MPC-demonstrated trajectory in the source domain was sufficient for controlling the cart-pole in the target domain. Furthermore, the system effectively handled model parameter variations, allowing for a less conservative control.
comment: I withdraw my previously written paper in order to further develop my research and revise its content
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning AAAI 2025
Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.
comment: 50 pages. 5 figures. AAAI 2025 MARW Best Paper Award
Optimal Control of Nonlinear Systems with Unknown Dynamics
This paper presents a data-driven method for finding a closed-loop optimal controller, which minimizes a specified infinite-horizon cost function for systems with unknown dynamics given any arbitrary initial state. Suppose the closed-loop optimal controller can be parameterized by a given class of functions, hereafter referred to as the policy. The proposed method introduces a novel gradient estimation framework, which approximates the gradient of the cost function with respect to the policy parameters via integrating the Koopman operator with the classical concept of actor-critic. This enables the policy parameters to be tuned iteratively using gradient descent to achieve an optimal controller, leveraging the linearity of the Koopman operator. The convergence analysis of the proposed framework is provided. The effectiveness of the method is demonstrated through comparisons with a model-free reinforcement learning approach, and its control performance is further evaluated through simulations against model-based optimal control methods that solve the same optimal control problem utilizing the exact system dynamics.
Deep Koopman Learning using Noisy Data
This paper proposes a data-driven framework to learn a finite-dimensional approximation of a Koopman operator for approximating the state evolution of a dynamical system under noisy observations. To this end, our proposed solution has two main advantages. First, the proposed method only requires the measurement noise to be bounded. Second, the proposed method modifies the existing deep Koopman operator formulations by characterizing the effect of the measurement noise on the Koopman operator learning and then mitigating it by updating the tunable parameter of the observable functions of the Koopman operator, making it easy to implement. The performance of the proposed method is demonstrated on several standard benchmarks. We then compare the presented method with similar methods proposed in the latest literature on Koopman learning.
Multiagent Systems
Evolutionary Computation and Large Language Models: A Survey of Methods, Synergies, and Applications
Integrating Large Language Models (LLMs) and Evolutionary Computation (EC) represents a promising avenue for advancing artificial intelligence by combining powerful natural language understanding with optimization and search capabilities. This manuscript explores the synergistic potential of LLMs and EC, reviewing their intersections, complementary strengths, and emerging applications. We identify key opportunities where EC can enhance LLM training, fine-tuning, prompt engineering, and architecture search, while LLMs can, in turn, aid in automating the design, analysis, and interpretation of ECs. The manuscript explores the synergistic integration of EC and LLMs, highlighting their bidirectional contributions to advancing artificial intelligence. It first examines how EC techniques enhance LLMs by optimizing key components such as prompt engineering, hyperparameter tuning, and architecture search, demonstrating how evolutionary methods automate and refine these processes. Secondly, the survey investigates how LLMs improve EC by automating metaheuristic design, tuning evolutionary algorithms, and generating adaptive heuristics, thereby increasing efficiency and scalability. Emerging co-evolutionary frameworks are discussed, showcasing applications across diverse fields while acknowledging challenges like computational costs, interpretability, and algorithmic convergence. The survey concludes by identifying open research questions and advocating for hybrid approaches that combine the strengths of EC and LLMs.
SwarmDiff: Swarm Robotic Trajectory Planning in Cluttered Environments via Diffusion Transformer
Swarm robotic trajectory planning faces challenges in computational efficiency, scalability, and safety, particularly in complex, obstacle-dense environments. To address these issues, we propose SwarmDiff, a hierarchical and scalable generative framework for swarm robots. We model the swarm's macroscopic state using Probability Density Functions (PDFs) and leverage conditional diffusion models to generate risk-aware macroscopic trajectory distributions, which then guide the generation of individual robot trajectories at the microscopic level. To ensure a balance between the swarm's optimal transportation and risk awareness, we integrate Wasserstein metrics and Conditional Value at Risk (CVaR). Additionally, we introduce a Diffusion Transformer (DiT) to improve sampling efficiency and generation quality by capturing long-range dependencies. Extensive simulations and real-world experiments demonstrate that SwarmDiff outperforms existing methods in computational efficiency, trajectory validity, and scalability, making it a reliable solution for swarm robotic trajectory planning.
Object-centric Processes with Structured Data and Exact Synchronization (Extended Version)
Real-world processes often involve interdependent objects that also carry data values, such as integers, reals, or strings. However, existing process formalisms fall short to combine key modeling features, such as tracking object identities, supporting complex datatypes, handling dependencies among them, and object-aware synchronization. Object-centric Petri nets with identifiers (OPIDs) partially address these needs but treat objects as unstructured identifiers (e.g., order and item IDs), overlooking the rich semantics of complex data values (e.g., item prices or other attributes). To overcome these limitations, we introduce data-aware OPIDs (DOPIDs), a framework that strictly extends OPIDs by incorporating structured data manipulation capabilities, and full synchronization mechanisms. In spite of the expressiveness of the model, we show that it can be made operational: Specifically, we define a novel conformance checking approach leveraging satisfiability modulo theories (SMT) to compute data-aware object-centric alignments.
comment: This is an extended version of the paper "Object-centric Processes with Structured Data and Exact Synchronization: Formal Modelling and Conformance Checking" to be published at CAiSE 2025
Fault-Tolerant Multi-Robot Coordination with Limited Sensing within Confined Environments
As robots are increasingly deployed to collaborate on tasks within shared workspaces and resources, the failure of an individual robot can critically affect the group's performance. This issue is particularly challenging when robots lack global information or direct communication, relying instead on social interaction for coordination and to complete their tasks. In this study, we propose a novel fault-tolerance technique leveraging physical contact interactions in multi-robot systems, specifically under conditions of limited sensing and spatial confinement. We introduce the "Active Contact Response" (ACR) method, where each robot modulates its behavior based on the likelihood of encountering an inoperative (faulty) robot. Active robots are capable of collectively repositioning stationary and faulty peers to reduce obstructions and maintain optimal group functionality. We implement our algorithm in a team of autonomous robots, equipped with contact-sensing and collision-tolerance capabilities, tasked with collectively excavating cohesive model pellets. Experimental results indicate that the ACR method significantly improves the system's recovery time from robot failures, enabling continued collective excavation with minimal performance degradation. Thus, this work demonstrates the potential of leveraging local, social, and physical interactions to enhance fault tolerance and coordination in multi-robot systems operating in constrained and extreme environments.
comment: 15 pages, 4 figures. Accepted to DARS 2024 (Distributed Autonomous Robotic Systems), to appear in Springer Proceedings in Advanced Robotics
Toward Task Capable Active Matter: Learning to Avoid Clogging in Confined Collectives via Collisions
Social organisms which construct nests consisting of tunnels and chambers necessarily navigate confined and crowded conditions. Unlike low-density collectives like bird flocks and insect swarms, in which hydrodynamic and statistical phenomena dominate, the physics of glasses and supercooled fluids is important to understand clogging behaviors in high-density collectives. Our previous work revealed that fire ants flowing in confined tunnels utilize diverse behaviors like unequal workload distributions, spontaneous direction reversals, and limited interaction times to mitigate clogging and jamming and thus maintain functional flow; implementation of similar rules in a small robophysical swarm led to high performance through spontaneous dissolution of clogs and clusters. However, how the insects learn such behaviors, and how we can develop "task capable" active matter in such regimes, remains a challenge in part because interaction dynamics are dominated by local, time-consuming collisions and no single agent can guide the entire collective. Here, we hypothesized that effective flow and clog mitigation could emerge purely through local learning. We tasked small groups of robots with pellet excavation in a narrow tunnel, allowing them to modify reversal probabilities over time. Initially, robots had equal probabilities and clogs were common. Reversals improved flow. When reversal probabilities adapted via collisions and noisy tunnel length estimates, workload inequality and performance improved. Our robophysical study of an excavating swarm shows that, despite the seeming complexity and difficulty of the task, simple learning rules can mitigate or leverage unavoidable features in task-capable dense active matter, leading to hypotheses for dense biological and robotic swarms.
comment: 13 pages, 9 figures. Published in Frontiers in Physics, Social Physics section. Includes experimental and simulation analysis of multi-robot excavation using decentralized learning
Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization
Recently, many approaches, such as Chain-of-Thought (CoT) prompting and Multi-Agent Debate (MAD), have been proposed to further enrich Large Language Models' (LLMs) complex problem-solving capacities in reasoning scenarios. However, these methods may fail to solve complex problems due to the lack of ability to find optimal solutions. Swarm Intelligence has been serving as a powerful tool for finding optima in the field of traditional optimization problems. To this end, we propose integrating swarm intelligence into the reasoning process by introducing a novel Agent-based Swarm Intelligence (ASI) paradigm. In this paradigm, we formulate LLM reasoning as an optimization problem and use a swarm intelligence scheme to guide a group of LLM-based agents in collaboratively searching for optimal solutions. To avoid swarm intelligence getting trapped in local optima, we further develop a Swarm Intelligence Enhancing Reasoning (SIER) framework, which develops a density-driven strategy to enhance the reasoning ability. To be specific, we propose to perform kernel density estimation and non-dominated sorting to optimize both solution quality and diversity simultaneously. In this case, SIER efficiently enhances solution space exploration through expanding the diversity of the reasoning path. Besides, a step-level quality evaluation is used to help agents improve solution quality by correcting low-quality intermediate steps. Then, we use quality thresholds to dynamically control the termination of exploration and the selection of candidate steps, enabling a more flexible and efficient reasoning process. Extensive experiments are ...
CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution
Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and knowledge-hint injection that transforms insights into adaptive attack strategies. Comprehensive evaluations with different configurations show CRAKEN's effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. With a knowledge database of CTF writeups, CRAKEN obtained an accuracy of 22% on NYU CTF Bench, outperforming prior works by 3% and achieving state-of-the-art results. On evaluation of MITRE ATT&CK techniques, CRAKEN solves 25-30% more techniques than prior work, demonstrating improved cybersecurity capabilities via knowledge-based execution. We make our framework open source to public https://github.com/NYU-LLM-CTF/nyuctf_agents_craken.
Two-person Positive Shortest Path Games Have Nash Equilibria in Pure Stationary Strategies
We prove that every finite two-person shortest path game, where the local cost of every move is positive for each player, has a Nash equilibrium (NE) in pure stationary strategies, which can be computed in polynomial time. We also extend the existence result to infinite graphs with finite out-degrees. Moreover, our proof gives that a terminal NE (in which the play is a path from the initial position to a terminal) exists provided at least one of the two players can guarantee reaching a terminal. If none of the players can do it, in other words, if each of the two players has a strategy that separates all terminals from the initial position $s$, then, obviously, a cyclic NE exists, although its cost is infinite for both players, since we restrict ourselves to positive games. We conjecture that a terminal NE exists too, provided there exists a directed path from $s$ to a terminal. However, this is open. We extend our result to short paths interdiction games, where at each vertex, we allow one player to block some of the arcs and the other player to choose one of the non-blocked arcs. Assuming that blocking sets are chosen from an independence system given by an oracle, we give an algorithm for computing a NE in time $O(|E|(\log|V|+\tau))$, where $V$ is the set of vertices, $E$ is the set of arcs, and $\tau$ is the maximum time taken by the oracle on any input.
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning AAAI 2025
Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.
comment: 50 pages. 5 figures. AAAI 2025 MARW Best Paper Award
OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration
Agents powered by advanced large language models (LLMs) have demonstrated impressive capabilities across diverse complex applications. Recently, Multi-Agent Systems (MAS), wherein multiple agents collaborate and communicate with each other, have exhibited enhanced capabilities in complex tasks, such as high-quality code generation and arithmetic reasoning. However, the development of such systems often relies on handcrafted methods, and the literature on systematic design and optimization of LLM-based MAS remains limited. In this work, we introduce OMAC, a general framework designed for holistic optimization of LLM-based MAS. Specifically, we identify five key optimization dimensions for MAS, encompassing both agent functionality and collaboration structure. Building upon these dimensions, we first propose a general algorithm, utilizing two actors termed the Semantic Initializer and the Contrastive Comparator, to optimize any single dimension. Then, we present an algorithm for joint optimization across multiple dimensions. Extensive experiments demonstrate the superior performance of OMAC on code generation, arithmetic reasoning, and general reasoning tasks against state-of-the-art approaches.
Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor
Previous studies that have formulated multi-agent reinforcement learning (RL) algorithms for adaptive traffic signal control have primarily used value-based RL methods. However, recent literature has shown that policy-based methods may perform better in partially observable environments. Additionally, RL methods remain largely untested for real-world normally signal timing plans because of the simplifying assumptions common in the literature. The current study attempts to address these gaps and formulates a multi-agent proximal policy optimization (MA-PPO) algorithm to implement adaptive and coordinated traffic control along an arterial corridor. The formulated MA-PPO has a centralized-critic architecture under a centralized training and decentralized execution framework. Agents are designed to allow selection and implementation of up to eight signal phases, as commonly implemented in field controllers. The formulated algorithm is tested on a simulated real-world seven intersection corridor. The speed of convergence for each agent was found to depend on the size of the action space, which depends on the number and sequence of signal phases. The performance of the formulated MA-PPO adaptive control algorithm is compared with the field implemented actuated-coordinated signal control (ASC), modeled using PTV-Vissim-MaxTime software in the loop simulation (SILs). The trained MA-PPO performed significantly better than the ASC for all movements. Compared to ASC the MA-PPO showed 2% and 24% improvements in travel time in the primary and secondary coordination directions, respectively. For cross streets movements MA-PPO also showed significant crossing time reductions. Volume sensitivity experiments revealed that the formulated MA-PPO demonstrated good stability, robustness, and adaptability to changes in traffic demand.
Robotics
Traversability-aware path planning in dynamic environments
Planning in environments with moving obstacles remains a significant challenge in robotics. While many works focus on navigation and path planning in obstacle-dense spaces, traversing such congested regions is often avoidable by selecting alternative routes. This paper presents Traversability-aware FMM (Tr-FMM), a path planning method that computes paths in dynamic environments, avoiding crowded regions. The method operates in two steps: first, it discretizes the environment, identifying regions and their distribution; second, it computes the traversability of regions, aiming to minimize both obstacle risks and goal deviation. The path is then computed by propagating the wavefront through regions with higher traversability. Simulated and real-world experiments demonstrate that the approach enhances significant safety by keeping the robot away from regions with obstacles while reducing unnecessary deviations from the goal.
NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation
Autonomous robots must navigate and operate in diverse environments, from terrestrial and aquatic settings to aerial and space domains. While Reinforcement Learning (RL) has shown promise in training policies for specific autonomous robots, existing benchmarks are often constrained to unique platforms, limiting generalization and fair comparisons across different mobility systems. In this paper, we present NavBench, a multi-domain benchmark for training and evaluating RL-based navigation policies across diverse robotic platforms and operational environments. Built on IsaacLab, our framework standardizes task definitions, enabling different robots to tackle various navigation challenges without the need for ad-hoc task redesigns or custom evaluation metrics. Our benchmark addresses three key challenges: (1) Unified cross-medium benchmarking, enabling direct evaluation of diverse actuation methods (thrusters, wheels, water-based propulsion) in realistic environments; (2) Scalable and modular design, facilitating seamless robot-task interchangeability and reproducible training pipelines; and (3) Robust sim-to-real validation, demonstrated through successful policy transfer to multiple real-world robots, including a satellite robotic simulator, an unmanned surface vessel, and a wheeled ground vehicle. By ensuring consistency between simulation and real-world deployment, NavBench simplifies the development of adaptable RL-based navigation strategies. Its modular design allows researchers to easily integrate custom robots and tasks by following the framework's predefined templates, making it accessible for a wide range of applications. Our code is publicly available at NavBench.
comment: Submitted for publication. Under review (2025)
Robust Immersive Bilateral Teleoperation of Dissimilar Systems with Enhanced Transparency and Sense of Embodiment
In human-in-the-loop systems such as teleoperation, especially those involving heavy-duty manipulators, achieving high task performance requires both robust control and strong human engagement. This paper presents a bilateral teleoperation framework that enhances the operator's Sense of Embodiment (SoE), specifically, the senses of agency and self-location, through an immersive virtual reality interface and distributed haptic feedback via an exoskeleton. To support this embodiment and stablish high level of motion and force transparency, we develop a force-sensorless, robust control architecture that tackles input nonlinearities, master-slave asymmetries, unknown uncertainties, and arbitrary time delays. A human-robot augmented dynamic model is integrated into the control loop to enhance human-adaptability of the controller. Theoretical analysis confirms semi-global uniform ultimate boundedness of the closed-loop system. Extensive real-world experiments demonstrate high accuracy tracking under up to 1:13 motion scaling and 1:1000 force scaling, showcasing the significance of the results. Additionally, the stability-transparency tradeoff for motion tracking and force reflection-tracking is establish up to 150 ms of one-way fix and time-varying communication delay. The results of user study with 10 participants (9 male and 1 female) demonstrated that the system can imply a good level of SoE (76.4%), at the same time is very user friendly with no gender limitation. These results are significant given the scale and weight of the heavy-duty manipulators.
Semantically-driven Deep Reinforcement Learning for Inspection Path Planning
This paper introduces a novel semantics-aware inspection planning policy derived through deep reinforcement learning. Reflecting the fact that within autonomous informative path planning missions in unknown environments, it is often only a sparse set of objects of interest that need to be inspected, the method contributes an end-to-end policy that simultaneously performs semantic object visual inspection combined with collision-free navigation. Assuming access only to the instantaneous depth map, the associated segmentation image, the ego-centric local occupancy, and the history of past positions in the robot's neighborhood, the method demonstrates robust generalizability and successful crossing of the sim2real gap. Beyond simulations and extensive comparison studies, the approach is verified in experimental evaluations onboard a flying robot deployed in novel environments with previously unseen semantics and overall geometric configurations.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds
We present a conceptual framework for training Vision-Language Models (VLMs) to perform Visual Perspective Taking (VPT), a core capability for embodied cognition essential for Human-Robot Interaction (HRI). As a first step toward this goal, we introduce a synthetic dataset, generated in NVIDIA Omniverse, that enables supervised learning for spatial reasoning tasks. Each instance includes an RGB image, a natural language description, and a ground-truth 4X4 transformation matrix representing object pose. We focus on inferring Z-axis distance as a foundational skill, with future extensions targeting full 6 Degrees Of Freedom (DOFs) reasoning. The dataset is publicly available to support further research. This work serves as a foundational step toward embodied AI systems capable of spatial understanding in interactive human-robot scenarios.
comment: Accepted to: Intelligent Autonomous Systems (IAS) 2025 as Late Breaking Report
Local Minima Prediction using Dynamic Bayesian Filtering for UGV Navigation in Unstructured Environments
Path planning is crucial for the navigation of autonomous vehicles, yet these vehicles face challenges in complex and real-world environments. Although a global view may be provided, it is often outdated, necessitating the reliance of Unmanned Ground Vehicles (UGVs) on real-time local information. This reliance on partial information, without considering the global context, can lead to UGVs getting stuck in local minima. This paper develops a method to proactively predict local minima using Dynamic Bayesian filtering, based on the detected obstacles in the local view and the global goal. This approach aims to enhance the autonomous navigation of self-driving vehicles by allowing them to predict potential pitfalls before they get stuck, and either ask for help from a human, or re-plan an alternate trajectory.
Sampling-Based System Identification with Active Exploration for Legged Robot Sim2Real Learning
Sim-to-real discrepancies hinder learning-based policies from achieving high-precision tasks in the real world. While Domain Randomization (DR) is commonly used to bridge this gap, it often relies on heuristics and can lead to overly conservative policies with degrading performance when not properly tuned. System Identification (Sys-ID) offers a targeted approach, but standard techniques rely on differentiable dynamics and/or direct torque measurement, assumptions that rarely hold for contact-rich legged systems. To this end, we present SPI-Active (Sampling-based Parameter Identification with Active Exploration), a two-stage framework that estimates physical parameters of legged robots to minimize the sim-to-real gap. SPI-Active robustly identifies key physical parameters through massive parallel sampling, minimizing state prediction errors between simulated and real-world trajectories. To further improve the informativeness of collected data, we introduce an active exploration strategy that maximizes the Fisher Information of the collected real-world trajectories via optimizing the input commands of an exploration policy. This targeted exploration leads to accurate identification and better generalization across diverse tasks. Experiments demonstrate that SPI-Active enables precise sim-to-real transfer of learned policies to the real world, outperforming baselines by 42-63% in various locomotion tasks.
M3Depth: Wavelet-Enhanced Depth Estimation on Mars via Mutual Boosting of Dual-Modal Data
Depth estimation plays a great potential role in obstacle avoidance and navigation for further Mars exploration missions. Compared to traditional stereo matching, learning-based stereo depth estimation provides a data-driven approach to infer dense and precise depth maps from stereo image pairs. However, these methods always suffer performance degradation in environments with sparse textures and lacking geometric constraints, such as the unstructured terrain of Mars. To address these challenges, we propose M3Depth, a depth estimation model tailored for Mars rovers. Considering the sparse and smooth texture of Martian terrain, which is primarily composed of low-frequency features, our model incorporates a convolutional kernel based on wavelet transform that effectively captures low-frequency response and expands the receptive field. Additionally, we introduce a consistency loss that explicitly models the complementary relationship between depth map and surface normal map, utilizing the surface normal as a geometric constraint to enhance the accuracy of depth estimation. Besides, a pixel-wise refinement module with mutual boosting mechanism is designed to iteratively refine both depth and surface normal predictions. Experimental results on synthetic Mars datasets with depth annotations show that M3Depth achieves a significant 16% improvement in depth estimation accuracy compared to other state-of-the-art methods in depth estimation. Furthermore, the model demonstrates strong applicability in real-world Martian scenarios, offering a promising solution for future Mars exploration missions.
FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning
The use of guidance to steer sampling toward desired outcomes has been widely explored within diffusion models, especially in applications such as image and trajectory generation. However, incorporating guidance during training remains relatively underexplored. In this work, we introduce energy-guided flow matching, a novel approach that enhances the training of flow models and eliminates the need for guidance at inference time. We learn a conditional velocity field corresponding to the flow policy by approximating an energy-guided probability path as a Gaussian path. Learning guided trajectories is appealing for tasks where the target distribution is defined by a combination of data and an energy function, as in reinforcement learning. Diffusion-based policies have recently attracted attention for their expressive power and ability to capture multi-modal action distributions. Typically, these policies are optimized using weighted objectives or by back-propagating gradients through actions sampled by the policy. As an alternative, we propose FlowQ, an offline reinforcement learning algorithm based on energy-guided flow matching. Our method achieves competitive performance while the policy training time is constant in the number of flow sampling steps.
Unconventional Hexacopters via Evolution and Learning: Performance Gains and New Insights
Evolution and learning have historically been interrelated topics, and their interplay is attracting increased interest lately. The emerging new factor in this trend is morphological evolution, the evolution of physical forms within embodied AI systems such as robots. In this study, we investigate a system of hexacopter-type drones with evolvable morphologies and learnable controllers and make contributions to two fields. For aerial robotics, we demonstrate that the combination of evolution and learning can deliver non-conventional drones that significantly outperform the traditional hexacopter on several tasks that are more complex than previously considered in the literature. For the field of Evolutionary Computing, we introduce novel metrics and perform new analyses into the interaction of morphological evolution and learning, uncovering hitherto unidentified effects. Our analysis tools are domain-agnostic, making a methodological contribution towards building solid foundations for embodied AI systems that integrate evolution and learning.
comment: 16 pages, 14 figures, currently under review
On-Demand Scenario Generation for Testing Automated Driving Systems
The safety and reliability of Automated Driving Systems (ADS) are paramount, necessitating rigorous testing methodologies to uncover potential failures before deployment. Traditional testing approaches often prioritize either natural scenario sampling or safety-critical scenario generation, resulting in overly simplistic or unrealistic hazardous tests. In practice, the demand for natural scenarios (e.g., when evaluating the ADS's reliability in real-world conditions), critical scenarios (e.g., when evaluating safety in critical situations), or somewhere in between (e.g., when testing the ADS in regions with less civilized drivers) varies depending on the testing objectives. To address this issue, we propose the On-demand Scenario Generation (OSG) Framework, which generates diverse scenarios with varying risk levels. Achieving the goal of OSG is challenging due to the complexity of quantifying the criticalness and naturalness stemming from intricate vehicle-environment interactions, as well as the need to maintain scenario diversity across various risk levels. OSG learns from real-world traffic datasets and employs a Risk Intensity Regulator to quantitatively control the risk level. It also leverages an improved heuristic search method to ensure scenario diversity. We evaluate OSG on the Carla simulators using various ADSs. We verify OSG's ability to generate scenarios with different risk levels and demonstrate its necessity by comparing accident types across risk levels. With the help of OSG, we are now able to systematically and objectively compare the performance of different ADSs based on different risk levels.
comment: 20 pages, 9 figures. Accepted by FSE 2025
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Vision-language-action (VLA) models have shown promise as generalist robotic policies by jointly leveraging visual, linguistic, and proprioceptive modalities to generate action trajectories. While recent benchmarks have advanced VLA research in domestic tasks, professional science-oriented domains remain underexplored. We introduce AutoBio, a simulation framework and benchmark designed to evaluate robotic automation in biology laboratory environments--an application domain that combines structured protocols with demanding precision and multimodal interaction. AutoBio extends existing simulation capabilities through a pipeline for digitizing real-world laboratory instruments, specialized physics plugins for mechanisms ubiquitous in laboratory workflows, and a rendering stack that support dynamic instrument interfaces and transparent materials through physically based rendering. Our benchmark comprises biologically grounded tasks spanning three difficulty levels, enabling standardized evaluation of language-guided robotic manipulation in experimental protocols. We provide infrastructure for demonstration generation and seamless integration with VLA models. Baseline evaluations with two SOTA VLA models reveal significant gaps in precision manipulation, visual reasoning, and instruction following in scientific workflows. By releasing AutoBio, we aim to catalyze research on generalist robotic systems for complex, high-precision, and multimodal professional environments. The simulator and benchmark are publicly available to facilitate reproducible research.
Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation
Effectively utilizing multi-sensory data is important for robots to generalize across diverse tasks. However, the heterogeneous nature of these modalities makes fusion challenging. Existing methods propose strategies to obtain comprehensively fused features but often ignore the fact that each modality requires different levels of attention at different manipulation stages. To address this, we propose a force-guided attention fusion module that adaptively adjusts the weights of visual and tactile features without human labeling. We also introduce a self-supervised future force prediction auxiliary task to reinforce the tactile modality, improve data imbalance, and encourage proper adjustment. Our method achieves an average success rate of 93% across three fine-grained, contactrich tasks in real-world experiments. Further analysis shows that our policy appropriately adjusts attention to each modality at different manipulation stages. The videos can be viewed at https://adaptac-dex.github.io/.
Hypothesis on the Functional Advantages of the Selection-Broadcast Cycle Structure: Global Workspace Theory and Dealing with a Real-Time World
This paper discusses the functional advantages of the Selection-Broadcast Cycle structure proposed by Global Workspace Theory (GWT), inspired by human consciousness, particularly focusing on its applicability to artificial intelligence and robotics in dynamic, real-time scenarios. While previous studies often examined the Selection and Broadcast processes independently, this research emphasizes their combined cyclic structure and the resulting benefits for real-time cognitive systems. Specifically, the paper identifies three primary benefits: Dynamic Thinking Adaptation, Experience-Based Adaptation, and Immediate Real-Time Adaptation. This work highlights GWT's potential as a cognitive architecture suitable for sophisticated decision-making and adaptive performance in unsupervised, dynamic environments. It suggests new directions for the development and implementation of robust, general-purpose AI and robotics systems capable of managing complex, real-world tasks.
MultiDrive: A Co-Simulation Framework Bridging 2D and 3D Driving Simulation for AV Software Validation SC 2025
Scenario-based testing using simulations is a cornerstone of Autonomous Vehicles (AVs) software validation. So far, developers needed to choose between low-fidelity 2D simulators to explore the scenario space efficiently, and high-fidelity 3D simulators to study relevant scenarios in more detail, thus reducing testing costs while mitigating the sim-to-real gap. This paper presents a novel framework that leverages multi-agent co-simulation and procedural scenario generation to support scenario-based testing across low- and high-fidelity simulators for the development of motion planning algorithms. Our framework limits the effort required to transition scenarios between simulators and automates experiment execution, trajectory analysis, and visualization. Experiments with a reference motion planner show that our framework uncovers discrepancies between the planner's intended and actual behavior, thus exposing weaknesses in planning assumptions under more realistic conditions. Our framework is available at: https://github.com/TUM-AVS/MultiDrive
comment: 7 pages, Submitted to the IEEE International Conference on Intelligent Transportation Systems (ITSC 2025), Australia
Sketch Interface for Teleoperation of Mobile Manipulator to Enable Intuitive and Intended Operation: A Proof of Concept
Recent advancements in robotics have underscored the need for effective collaboration between humans and robots. Traditional interfaces often struggle to balance robot autonomy with human oversight, limiting their practical application in complex tasks like mobile manipulation. This study aims to develop an intuitive interface that enables a mobile manipulator to autonomously interpret user-provided sketches, enhancing user experience while minimizing burden. We implemented a web-based application utilizing machine learning algorithms to process sketches, making the interface accessible on mobile devices for use anytime, anywhere, by anyone. In the first validation, we examined natural sketches drawn by users for 27 selected manipulation and navigation tasks, gaining insights into trends related to sketch instructions. The second validation involved comparative experiments with five grasping tasks, showing that the sketch interface reduces workload and enhances intuitiveness compared to conventional axis control interfaces. These findings suggest that the proposed sketch interface improves the efficiency of mobile manipulators and opens new avenues for integrating intuitive human-robot collaboration in various applications.
comment: This paper has been accepted to the the 20th edition of the IEEE/ACM International Conference on Human-Robot Interaction (HRI'25), which will be held in Melbourne, Australia on March 4-6, 2025
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning
Symmetry is pervasive in robotics and has been widely exploited to improve sample efficiency in deep reinforcement learning (DRL). However, existing approaches primarily focus on spatial symmetries, such as reflection, rotation, and translation, while largely neglecting temporal symmetries. To address this gap, we explore time reversal symmetry, a form of temporal symmetry commonly found in robotics tasks such as door opening and closing. We propose Time Reversal symmetry enhanced Deep Reinforcement Learning (TR-DRL), a framework that combines trajectory reversal augmentation and time reversal guided reward shaping to efficiently solve temporally symmetric tasks. Our method generates reversed transitions from fully reversible transitions, identified by a proposed dynamics-consistent filter, to augment the training data. For partially reversible transitions, we apply reward shaping to guide learning, according to successful trajectories from the reversed task. Extensive experiments on the Robosuite and MetaWorld benchmarks demonstrate that TR-DRL is effective in both single-task and multi-task settings, achieving higher sample efficiency and stronger final performance compared to baseline methods.
APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight
Large Language Models (LLMs) demonstrate strong reasoning and task planning capabilities but remain fundamentally limited in physical interaction modeling. Existing approaches integrate perception via Vision-Language Models (VLMs) or adaptive decision-making through Reinforcement Learning (RL), but they fail to capture dynamic object interactions or require task-specific training, limiting their real-world applicability. We introduce APEX (Anticipatory Physics-Enhanced Execution), a framework that equips LLMs with physics-driven foresight for real-time task planning. APEX constructs structured graphs to identify and model the most relevant dynamic interactions in the environment, providing LLMs with explicit physical state updates. Simultaneously, APEX provides low-latency forward simulations of physically feasible actions, allowing LLMs to select optimal strategies based on predictive outcomes rather than static observations. We evaluate APEX on three benchmarks designed to assess perception, prediction, and decision-making: (1) Physics Reasoning Benchmark, testing causal inference and object motion prediction; (2) Tetris, evaluating whether physics-informed prediction enhances decision-making performance in long-horizon planning tasks; (3) Dynamic Obstacle Avoidance, assessing the immediate integration of perception and action feasibility analysis. APEX significantly outperforms standard LLMs and VLM-based models, demonstrating the necessity of explicit physics reasoning for bridging the gap between language-based intelligence and real-world task execution. The source code and experiment setup are publicly available at https://github.com/hwj20/APEX_EXP .
Robotic Monitoring of Colorimetric Leaf Sensors for Precision Agriculture ICRA
Current remote sensing technologies that measure crop health e.g. RGB, multispectral, hyperspectral, and LiDAR, are indirect, and cannot capture plant stress indicators directly. Instead, low-cost leaf sensors that directly interface with the crop surface present an opportunity to advance real-time direct monitoring. To this end, we co-design a sensor-detector system, where the sensor is a novel colorimetric leaf sensor that directly measures crop health in a precision agriculture setting, and the detector autonomously obtains optical signals from these leaf sensors. This system integrates a ground robot platform with an on-board monocular RGB camera and object detector to localize the leaf sensor, and a hyperspectral camera with motorized mirror and an on-board halogen light to acquire a hyperspectral reflectance image of the leaf sensor, from which a spectral response characterizing crop health can be extracted. We show a successful demonstration of our co-designed system operating in outdoor environments, obtaining spectra that are interpretable when compared to controlled laboratory-grade spectrometer measurements. The system is demonstrated in row-crop environments both indoors and outdoors where it is able to autonomously navigate, locate and obtain a hyperspectral image of all leaf sensors present, and retrieve interpretable spectral resonance from leaf sensors.
comment: Accepted to the Novel Approaches for Precision Agriculture and Forestry with Autonomous Robots IEEE ICRA Workshop - 2025
4D-ROLLS: 4D Radar Occupancy Learning via LiDAR Supervision
A comprehensive understanding of 3D scenes is essential for autonomous vehicles (AVs), and among various perception tasks, occupancy estimation plays a central role by providing a general representation of drivable and occupied space. However, most existing occupancy estimation methods rely on LiDAR or cameras, which perform poorly in degraded environments such as smoke, rain, snow, and fog. In this paper, we propose 4D-ROLLS, the first weakly supervised occupancy estimation method for 4D radar using the LiDAR point cloud as the supervisory signal. Specifically, we introduce a method for generating pseudo-LiDAR labels, including occupancy queries and LiDAR height maps, as multi-stage supervision to train the 4D radar occupancy estimation model. Then the model is aligned with the occupancy map produced by LiDAR, fine-tuning its accuracy in occupancy estimation. Extensive comparative experiments validate the exceptional performance of 4D-ROLLS. Its robustness in degraded environments and effectiveness in cross-dataset training are qualitatively demonstrated. The model is also seamlessly transferred to downstream tasks BEV segmentation and point cloud occupancy prediction, highlighting its potential for broader applications. The lightweight network enables 4D-ROLLS model to achieve fast inference speeds at about 30 Hz on a 4060 GPU. The code of 4D-ROLLS will be made available at https://github.com/CLASS-Lab/4D-ROLLS.
Learning to Insert for Constructive Neural Vehicle Routing Solver
Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the idea of insertion-based paradigm and propose Learning to Construct with Insertion-based Paradigm (L2C-Insert), a novel learning-based method for constructive NCO. Unlike traditional approaches, L2C-Insert builds solutions by strategically inserting unvisited nodes at any valid position in the current partial solution, which can significantly enhance the flexibility and solution quality. The proposed framework introduces three key components: a novel model architecture for precise insertion position prediction, an efficient training scheme for model optimization, and an advanced inference technique that fully exploits the insertion paradigm's flexibility. Extensive experiments on both synthetic and real-world instances of the Travelling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that L2C-Insert consistently achieves superior performance across various problem sizes.
Certifiably Safe Manipulation of Deformable Linear Objects via Joint Shape and Tension Prediction ICRA 2025
Manipulating deformable linear objects (DLOs) is challenging due to their complex dynamics and the need for safe interaction in contact-rich environments. Most existing models focus on shape prediction alone and fail to account for contact and tension constraints, which can lead to damage to both the DLO and the robot. In this work, we propose a certifiably safe motion planning and control framework for DLO manipulation. At the core of our method is a predictive model that jointly estimates the DLO's future shape and tension. These predictions are integrated into a real-time trajectory optimizer based on polynomial zonotopes, allowing us to enforce safety constraints throughout the execution. We evaluate our framework on a simulated wire harness assembly task using a 7-DOF robotic arm. Compared to state-of-the-art methods, our approach achieves a higher task success rate while avoiding all safety violations. The results demonstrate that our method enables robust and safe DLO manipulation in contact-rich environments.
comment: Accepted to ICRA 2025 Workshop on Learning Meets Model-Based Methods for Contact-Rich Manipulation
InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
Leveraging pretrained Vision-Language Models (VLMs) to map language instruction and visual observations to raw low-level actions, Vision-Language-Action models (VLAs) hold great promise for achieving general-purpose robotic systems. Despite their advancements, existing VLAs tend to spuriously correlate task-irrelevant visual features with actions, limiting their generalization capacity beyond the training data. To tackle this challenge, we propose Intrinsic Spatial Reasoning (InSpire), a simple yet effective approach that mitigates the adverse effects of spurious correlations by boosting the spatial reasoning ability of VLAs. Specifically, InSpire redirects the VLA's attention to task-relevant factors by prepending the question "In which direction is the [object] relative to the robot?" to the language instruction and aligning the answer "right/left/up/down/front/back/grasped" and predicted actions with the ground-truth. Notably, InSpire can be used as a plugin to enhance existing autoregressive VLAs, requiring no extra training data or interaction with other large models. Extensive experimental results in both simulation and real-world environments demonstrate the effectiveness and flexibility of our approach. Our code, pretrained models and demos are publicly available at: https://Koorye.github.io/proj/Inspire.
Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving
Autonomous Driving (AD) systems demand the high levels of safety assurance. Despite significant advancements in AD demonstrated on open-source benchmarks like Longest6 and Bench2Drive, existing datasets still lack regulatory-compliant scenario libraries for closed-loop testing to comprehensively evaluate the functional safety of AD. Meanwhile, real-world AD accidents are underrepresented in current driving datasets. This scarcity leads to inadequate evaluation of AD performance, posing risks to safety validation and practical deployment. To address these challenges, we propose Safety2Drive, a safety-critical scenario library designed to evaluate AD systems. Safety2Drive offers three key contributions. (1) Safety2Drive comprehensively covers the test items required by standard regulations and contains 70 AD function test items. (2) Safety2Drive supports the safety-critical scenario generalization. It has the ability to inject safety threats such as natural environment corruptions and adversarial attacks cross camera and LiDAR sensors. (3) Safety2Drive supports multi-dimensional evaluation. In addition to the evaluation of AD systems, it also supports the evaluation of various perception tasks, such as object detection and lane detection. Safety2Drive provides a paradigm from scenario construction to validation, establishing a standardized test framework for the safe deployment of AD.
Enhancing Robot Navigation Policies with Task-Specific Uncertainty Managements
Robots navigating complex environments must manage uncertainty from sensor noise, environmental changes, and incomplete information, with different tasks requiring varying levels of precision in different areas. For example, precise localization may be crucial near obstacles but less critical in open spaces. We present GUIDE (Generalized Uncertainty Integration for Decision-Making and Execution), a framework that integrates these task-specific requirements into navigation policies via Task-Specific Uncertainty Maps (TSUMs). By assigning acceptable uncertainty levels to different locations, TSUMs enable robots to adapt uncertainty management based on context. When combined with reinforcement learning, GUIDE learns policies that balance task completion and uncertainty management without extensive reward engineering. Real-world tests show significant performance gains over methods lacking task-specific uncertainty awareness.
Duawlfin: A Drone with Unified Actuation for Wheeled Locomotion and Flight Operation
This paper presents Duawlfin, a drone with unified actuation for wheeled locomotion and flight operation that achieves efficient, bidirectional ground mobility. Unlike existing hybrid designs, Duawlfin eliminates the need for additional actuators or propeller-driven ground propulsion by leveraging only its standard quadrotor motors and introducing a differential drivetrain with one-way bearings. This innovation simplifies the mechanical system, significantly reduces energy usage, and prevents the disturbance caused by propellers spinning near the ground, such as dust interference with sensors. Besides, the one-way bearings minimize the power transfer from motors to propellers in the ground mode, which enables the vehicle to operate safely near humans. We provide a detailed mechanical design, present control strategies for rapid and smooth mode transitions, and validate the concept through extensive experimental testing. Flight-mode tests confirm stable aerial performance comparable to conventional quadcopters, while ground-mode experiments demonstrate efficient slope climbing (up to 30{\deg}) and agile turning maneuvers approaching 1g lateral acceleration. The seamless transitions between aerial and ground modes further underscore the practicality and effectiveness of our approach for applications like urban logistics and indoor navigation. All the materials including 3-D model files, demonstration video and other assets are open-sourced at https://sites.google.com/view/Duawlfin.
comment: 8 pages, 8 figures
Toward Real-World Cooperative and Competitive Soccer with Quadrupedal Robot Teams
Achieving coordinated teamwork among legged robots requires both fine-grained locomotion control and long-horizon strategic decision-making. Robot soccer offers a compelling testbed for this challenge, combining dynamic, competitive, and multi-agent interactions. In this work, we present a hierarchical multi-agent reinforcement learning (MARL) framework that enables fully autonomous and decentralized quadruped robot soccer. First, a set of highly dynamic low-level skills is trained for legged locomotion and ball manipulation, such as walking, dribbling, and kicking. On top of these, a high-level strategic planning policy is trained with Multi-Agent Proximal Policy Optimization (MAPPO) via Fictitious Self-Play (FSP). This learning framework allows agents to adapt to diverse opponent strategies and gives rise to sophisticated team behaviors, including coordinated passing, interception, and dynamic role allocation. With an extensive ablation study, the proposed learning method shows significant advantages in the cooperative and competitive multi-agent soccer game. We deploy the learned policies to real quadruped robots relying solely on onboard proprioception and decentralized localization, with the resulting system supporting autonomous robot-robot and robot-human soccer matches on indoor and outdoor soccer courts.
comment: 11 pages, 12 figures
C*: A Coverage Path Planning Algorithm for Unknown Environments using Rapidly Covering Graphs
The paper presents a novel sample-based algorithm, called C*, for real-time coverage path planning (CPP) of unknown environments. The C* algorithm is built upon the concept of Rapidly Covering Graph (RCGs). The RCG is constructed incrementally via progressive sampling during robot navigation, which eliminates the need for cellular decomposition of the search space. The RCG has a sparse-graph structure formed by efficient sampling and pruning techniques, which produces non-myopic waypoints of the coverage trajectory. While C* produces the desired back and forth coverage pattern, it adapts to the TSP-based locally optimal coverage of small uncovered regions, called coverage holes, that are surrounded by obstacles and covered regions. Thus, C* proactively detects and covers the coverage holes in situ, which reduces the coverage time by preventing the longer return trajectories from distant regions to cover such holes later. The algorithmic simplicity and low computational complexity of C* makes it easy to implement and suitable for real-time onboard applications. It is analytically proven that C* provides complete coverage of unknown environments. The performance of C* is validated by 1) extensive high-fidelity simulations and 2) real laboratory experiments using autonomous robots. A comparative evaluation with seven existing CPP methods demonstrate that C* yields significant performance improvements in terms of coverage time, number of turns, trajectory length and overlap ratio, while preventing the formation of coverage holes. Finally, C* is evaluated on two different applications of CPP using 1) energy-constrained robots and 2) multi-robot teams.
Flattening Hierarchies with Policy Bootstrapping
Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail.
RoboCulture: A Robotics Platform for Automated Biological Experimentation
Automating biological experimentation remains challenging due to the need for millimeter-scale precision, long and multi-step experiments, and the dynamic nature of living systems. Current liquid handlers only partially automate workflows, requiring human intervention for plate loading, tip replacement, and calibration. Industrial solutions offer more automation but are costly and lack the flexibility needed in research settings. Meanwhile, research in autonomous robotics has yet to bridge the gap for long-duration, failure-sensitive biological experiments. We introduce RoboCulture, a cost-effective and flexible platform that uses a general-purpose robotic manipulator to automate key biological tasks. RoboCulture performs liquid handling, interacts with lab equipment, and leverages computer vision for real-time decisions using optical density-based growth monitoring. We demonstrate a fully autonomous 15-hour yeast culture experiment where RoboCulture uses vision and force feedback and a modular behavior tree framework to robustly execute, monitor, and manage experiments.
Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning
Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.
PCA-DDReach: Efficient Statistical Reachability Analysis of Stochastic Dynamical Systems via Principal Component Analysis
This study presents a scalable data-driven algorithm designed to efficiently address the challenging problem of reachability analysis. Analysis of cyber-physical systems (CPS) relies typically on parametric physical models of dynamical systems. However, identifying parametric physical models for complex CPS is challenging due to their complexity, uncertainty, and variability, often rendering them as black-box oracles. As an alternative, one can treat these complex systems as black-box models and use trajectory data sampled from the system (e.g., from high-fidelity simulators or the real system) along with machine learning techniques to learn models that approximate the underlying dynamics. However, these machine learning models can be inaccurate, highlighting the need for statistical tools to quantify errors. Recent advancements in the field include the incorporation of statistical uncertainty quantification tools such as conformal inference (CI) that can provide probabilistic reachable sets with provable guarantees. Recent work has even highlighted the ability of these tools to address the case where the distribution of trajectories sampled during training time are different from the distribution of trajectories encountered during deployment time. However, accounting for such distribution shifts typically results in more conservative guarantees. This is undesirable in practice and motivates us to present techniques that can reduce conservatism. Here, we propose a new approach that reduces conservatism and improves scalability by combining conformal inference with Principal Component Analysis (PCA). We show the effectiveness of our technique on various case studies, including a 12-dimensional quadcopter and a 27-dimensional hybrid system known as the powertrain.
Think, Reflect, Create: Metacognitive Learning for Zero-Shot Robotic Planning with LLMs
While large language models (LLMs) have shown great potential across various domains, their applications in robotics remain largely limited to static, prompt-based behaviors and still face challenges in handling complex tasks under zero-shot or few-shot settings. Inspired by human metacognitive learning and creative problem-solving, we address this limitation by exploring a fundamental research question: Can LLMs be empowered with metacognitive capabilities to reason, reflect, and create, thereby enhancing their ability to perform robotic tasks with minimal demonstrations? In this paper, we present an early-stage framework that integrates metacognitive learning into LLM-powered multi-robot collaboration. The proposed framework equips the LLM-powered robotic agents with a skill decomposition and self-reflection mechanism that identifies modular skills from prior tasks, reflects on failures in unseen task scenarios, and synthesizes effective new solutions. Experimental results show that our metacognitive-learning-empowered LLM framework significantly outperforms existing baselines. Moreover, we observe that the framework is capable of generating solutions that differ from the ground truth yet still successfully complete the tasks. These exciting findings support our hypothesis that metacognitive learning can foster creativity in robotic planning.
UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction
We introduce a unified approach to forecast the dynamics of human keypoints along with the motion trajectory based on a short sequence of input poses. While many studies address either full-body pose prediction or motion trajectory prediction, only a few attempt to merge them. We propose a motion transformation technique to simultaneously predict full-body pose and trajectory key-points in a global coordinate frame. We utilize an off-the-shelf 3D human pose estimation module, a graph attention network to encode the skeleton structure, and a compact, non-autoregressive transformer suitable for real-time motion prediction for human-robot interaction and human-aware navigation. We introduce a human navigation dataset ``DARKO'' with specific focus on navigational activities that are relevant for human-aware mobile robot navigation. We perform extensive evaluation on Human3.6M, CMU-Mocap, and our DARKO dataset. In comparison to prior work, we show that our approach is compact, real-time, and accurate in predicting human navigation motion across all datasets. Result animations, our dataset, and code will be available at https://nisarganc.github.io/UPTor-page/
comment: Project page: https://nisarganc.github.io/UPTor-page/
A Hierarchical Graph-Based Terrain-Aware Autonomous Navigation Approach for Complementary Multimodal Ground-Aerial Exploration
Autonomous navigation in unknown environments is a fundamental challenge in robotics, particularly in coordinating ground and aerial robots to maximize exploration efficiency. This paper presents a novel approach that utilizes a hierarchical graph to represent the environment, encoding both geometric and semantic traversability. The framework enables the robots to compute a shared confidence metric, which helps the ground robot assess terrain and determine when deploying the aerial robot will extend exploration. The robot's confidence in traversing a path is based on factors such as predicted volumetric gain, path traversability, and collision risk. A hierarchy of graphs is used to maintain an efficient representation of traversability and frontier information through multi-resolution maps. Evaluated in a real subterranean exploration scenario, the approach allows the ground robot to autonomously identify zones that are no longer traversable but suitable for aerial deployment. By leveraging this hierarchical structure, the ground robot can selectively share graph information on confidence-assessed frontier targets from parts of the scene, enabling the aerial robot to navigate beyond obstacles and continue exploration.
Coordinated motion control of a wire arc additive manufacturing robotic system for multi-directional building parts
This work investigates the manufacturing of complex shapes parts with wire arc additive manufacturing (WAAM). In order to guarantee the integrity and quality of each deposited layer that composes the final piece, the deposition process is usually carried out in a flat position. However, for complex geometry parts with non-flat surfaces, this strategy causes unsupported overhangs and staircase effect, which contribute to a poor surface finishing. Generally, the build direction is not constant for every deposited section or layer in complex geometry parts. As a result, there is an additional concern to ensure the build direction is aligned with gravity, thus improving the quality of the final part. This paper proposes an algorithm to control the torch motion with respect to a deposition substrate as well as the torch orientation with respect to an inertial frame. The control scheme is based on task augmentation applied to an extended kinematic chain composed by two robots, which constitutes a coordinated control problem, and allows the deposition trajectory to be planned with respect to the deposition substrate coordinate frame while aligning each layer buildup direction with gravity (or any other direction defined for an inertial frame). Parts with complex geometry aspects have been produced in a WAAM cell composed by two robots (a manipulator with a welding torch and a positioning table holding the workpiece) in order to validate the proposed approach.
DORA: Object Affordance-Guided Reinforcement Learning for Dexterous Robotic Manipulation
Dexterous robotic manipulation remains a longstanding challenge in robotics due to the high dimensionality of control spaces and the semantic complexity of object interaction. In this paper, we propose an object affordance-guided reinforcement learning framework that enables a multi-fingered robotic hand to learn human-like manipulation strategies more efficiently. By leveraging object affordance maps, our approach generates semantically meaningful grasp pose candidates that serve as both policy constraints and priors during training. We introduce a voting-based grasp classification mechanism to ensure functional alignment between grasp configurations and object affordance regions. Furthermore, we incorporate these constraints into a generalizable RL pipeline and design a reward function that unifies affordance-awareness with task-specific objectives. Experimental results across three manipulation tasks - cube grasping, jug grasping and lifting, and hammer use - demonstrate that our affordance-guided approach improves task success rates by an average of 15.4% compared to baselines. These findings highlight the critical role of object affordance priors in enhancing sample efficiency and learning generalizable, semantically grounded manipulation policies. For more details, please visit our project website https://sites.google.com/view/dora-manip.
comment: 8 pages
Integrating Field of View in Human-Aware Collaborative Planning
In human-robot collaboration (HRC), it is crucial for robot agents to consider humans' knowledge of their surroundings. In reality, humans possess a narrow field of view (FOV), limiting their perception. However, research on HRC often overlooks this aspect and presumes an omniscient human collaborator. Our study addresses the challenge of adapting to the evolving subtask intent of humans while accounting for their limited FOV. We integrate FOV within the human-aware probabilistic planning framework. To account for large state spaces due to considering FOV, we propose a hierarchical online planner that efficiently finds approximate solutions while enabling the robot to explore low-level action trajectories that enter the human FOV, influencing their intended subtask. Through user study with our adapted cooking domain, we demonstrate our FOV-aware planner reduces human's interruptions and redundant actions during collaboration by adapting to human perception limitations. We extend these findings to a virtual reality kitchen environment, where we observe similar collaborative behaviors.
comment: Published at International Conference on Robotics and Automation (2025)
Fast and scalable multi-robot deployment planning under connectivity constraints
In this paper we develop a method to coordinate the deployment of a multi-robot team to reach some locations of interest, so-called primary goals, and to transmit the information from these positions to a static Base Station (BS), under connectivity constraints. The relay positions have to be established for some robots to maintain the connectivity at the moment in which the other robots visit the primary goals. Once every robot reaches its assigned goal, they are again available to cover new goals, dynamically re-distributing the robots to the new tasks. The contribution of this work is a two stage method to deploy the team. Firstly, clusters of relay and primary positions are computed, obtaining a tree formed by chains of positions that have to be visited. Secondly, the order for optimally assigning and visiting the goals in the clusters is computed. We analyze different heuristics for sequential and parallel deployment in the clusters, obtaining sub-optimal solutions in short time for different number of robots and for a large amount of goals.
Practical Equivalence Testing and Its Application in Synthetic Pre-Crash Scenario Validation
The use of representative pre-crash scenarios is critical for assessing the safety impact of driving automation systems through simulation. However, a gap remains in the robust evaluation of the similarity between synthetic and real-world pre-crash scenarios and their crash characteristics. Without proper validation, it cannot be ensured that the synthetic test scenarios adequately represent real-world driving behaviors and crash characteristics. One reason for this validation gap is the lack of focus on methods to confirm that the synthetic test scenarios are practically equivalent to real-world ones, given the assessment scope. Traditional statistical methods, like significance testing, focus on detecting differences rather than establishing equivalence; since failure to detect a difference does not imply equivalence, they are of limited applicability for validating synthetic pre-crash scenarios and crash characteristics. This study addresses this gap by proposing an equivalence testing method based on the Bayesian Region of Practical Equivalence (ROPE) framework. This method is designed to assess the practical equivalence of scenario characteristics that are most relevant for the intended assessment, making it particularly appropriate for the domain of virtual safety assessments. We first review existing equivalence testing methods. Then we propose and demonstrate the Bayesian ROPE-based method by testing the equivalence of two rear-end pre-crash datasets. Our approach focuses on the most relevant scenario characteristics. Our analysis provides insights into the practicalities and effectiveness of equivalence testing in synthetic test scenario validation and demonstrates the importance of testing for improving the credibility of synthetic data for automated vehicle safety assessment, as well as the credibility of subsequent safety impact assessments.
MSCEKF-MIO: Magnetic-Inertial Odometry Based on Multi-State Constraint Extended Kalman Filter
To overcome the limitation of existing indoor odometry technologies which often cannot simultaneously meet requirements for accuracy cost-effectiveness, and robustness-this paper proposes a novel magnetometer array-aided inertial odometry approach, MSCEKF-MIO (Multi-State Constraint Extended Kalman Filter-based Magnetic-Inertial Odometry). We construct a magnetic field model by fitting measurements from the magnetometer array and then use temporal variations in this model-extracted from continuous observations-to estimate the carrier's absolute velocity. Furthermore, we implement the MSCEKF framework to fuse observed magnetic field variations with position and attitude estimates from inertial navigation system (INS) integration, thereby enabling autonomous, high-precision indoor relative positioning. Experimental results demonstrate that the proposed algorithm achieves superior velocity estimation accuracy and horizontal positioning precision relative to state-of-the-art magnetic array-aided INS algorithms (MAINS). On datasets with trajectory lengths of 150-250m, the proposed method yields an average horizontal position RMSE of approximately 2.5m. In areas with distinctive magnetic features, the magneto-inertial odometry achieves a velocity estimation accuracy of 0.07m/s. Consequently, the proposed method offers a novel positioning solution characterized by low power consumption, cost-effectiveness, and high reliability in complex indoor environments.
comment: 10 pages
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction
Vision-Language-Action (VLA) models have recently advanced robotic manipulation by translating natural-language instructions and image information into sequential control actions. However, these models often underperform in open-world scenarios, as they are predominantly trained on successful expert demonstrations and exhibit a limited capacity for failure recovery. In this work, we present a Robotic Failure Analysis and Correction (RoboFAC) framework to address this issue. Firstly, we construct RoboFAC dataset comprising 9,440 erroneous manipulation trajectories and 78,623 QA pairs across 16 diverse tasks and 53 scenes in both simulation and real-world environments. Leveraging our dataset, we develop RoboFAC model, which is capable of Task Understanding, Failure Analysis and Failure Correction. Experimental results demonstrate that the RoboFAC model outperforms GPT-4o by 34.1% on our evaluation benchmark. Furthermore, we integrate the RoboFAC model into a real-world VLA control pipeline as an external supervision providing correction instructions, yielding a 29.1% relative improvement on average on four real-world tasks. The results show that our RoboFAC framework effectively handles robotic failures and assists the VLA model in recovering from failures.
Learning Impact-Rich Rotational Maneuvers via Centroidal Velocity Rewards and Sim-to-Real Techniques: A One-Leg Hopper Flip Case Study
Dynamic rotational maneuvers, such as front flips, inherently involve large angular momentum generation and intense impact forces, presenting major challenges for reinforcement learning and sim-to-real transfer. In this work, we propose a general framework for learning and deploying impact-rich, rotation-intensive behaviors through centroidal velocity-based rewards and actuator-aware sim-to-real techniques. We identify that conventional link-level reward formulations fail to induce true whole-body rotation and introduce a centroidal angular velocity reward that accurately captures system-wide rotational dynamics. To bridge the sim-to-real gap under extreme conditions, we model motor operating regions (MOR) and apply transmission load regularization to ensure realistic torque commands and mechanical robustness. Using the one-leg hopper front flip as a representative case study, we demonstrate the first successful hardware realization of a full front flip. Our results highlight that incorporating centroidal dynamics and actuator constraints is critical for reliably executing highly dynamic motions. A supplementary video is available at: https://youtu.be/atMAVI4s1RY
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Vision-Language Navigation (VLN) is a critical task for developing embodied agents that can follow natural language instructions to navigate in complex real-world environments. Recent advances in VLN by large pretrained models have significantly improved generalization and instruction grounding compared to traditional approaches. However, the role of reasoning strategies in navigation-an action-centric, long-horizon task-remains underexplored, despite Chain-of-Thought (CoT) reasoning's demonstrated success in static tasks like visual question answering. To address this gap, we conduct the first systematic evaluation of reasoning strategies for VLN, including No-Think (direct action prediction), Pre-Think (reason before action), and Post-Think (reason after action). Surprisingly, our findings reveal the Inference-time Reasoning Collapse issue, where inference-time reasoning degrades navigation accuracy, highlighting the challenges of integrating reasoning into VLN. Based on this insight, we propose Aux-Think, a framework that trains models to internalize structured reasoning patterns through CoT supervision, while inferring action directly without reasoning in online prediction. To support this framework, we release R2R-CoT-320k, the first Chain-of-Thought annotated dataset for VLN. Extensive experiments show that Aux-Think reduces training effort greatly and achieves the best performance under the same data scale.
Reachability Barrier Networks: Learning Hamilton-Jacobi Solutions for Smooth and Flexible Control Barrier Functions
Recent developments in autonomous driving and robotics underscore the necessity of safety-critical controllers. Control barrier functions (CBFs) are a popular method for appending safety guarantees to a general control framework, but they are notoriously difficult to generate beyond low dimensions. Existing methods often yield non-differentiable or inaccurate approximations that lack integrity, and thus fail to ensure safety. In this work, we use physics-informed neural networks (PINNs) to generate smooth approximations of CBFs by computing Hamilton-Jacobi (HJ) optimal control solutions. These reachability barrier networks (RBNs) avoid traditional dimensionality constraints and support the tuning of their conservativeness post-training through a parameterized discount term. To ensure robustness of the discounted solutions, we leverage conformal prediction methods to derive probabilistic safety guarantees for RBNs. We demonstrate that RBNs are highly accurate in low dimensions, and safer than the standard neural CBF approach in high dimensions. Namely, we showcase the RBNs in a 9D multi-vehicle collision avoidance problem where it empirically proves to be 5.5x safer and 1.9x less conservative than the neural CBFs, offering a promising method to synthesize CBFs for general nonlinear autonomous systems.
comment: 15 pages, 7 figures
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models ICRA-2025
Recent progress in large language models and access to large-scale robotic datasets has sparked a paradigm shift in robotics models transforming them into generalists able to adapt to various tasks, scenes, and robot modalities. A large step for the community are open Vision Language Action models which showcase strong performance in a wide variety of tasks. In this work, we study the visual generalization capabilities of three existing robotic foundation models, and propose a corresponding evaluation framework. Our study shows that the existing models do not exhibit robustness to visual out-of-domain scenarios. This is potentially caused by limited variations in the training data and/or catastrophic forgetting, leading to domain limitations in the vision foundation models. We further explore OpenVLA, which uses two pre-trained vision foundation models and is, therefore, expected to generalize to out-of-domain experiments. However, we showcase catastrophic forgetting by DINO-v2 in OpenVLA through its failure to fulfill the task of depth regression. To overcome the aforementioned issue of visual catastrophic forgetting, we propose a gradual backbone reversal approach founded on model merging. This enables OpenVLA -- which requires the adaptation of the visual backbones during initial training -- to regain its visual generalization ability. Regaining this capability enables our ReVLA model to improve over OpenVLA by a factor of 77\% and 66\% for grasping and lifting in visual OOD tasks. Comprehensive evaluations, episode rollouts and model weights are available on the ReVLA Page
comment: Accepted at ICRA-2025, Atlanta
Diffusion-Based Failure Sampling for Evaluating Safety-Critical Autonomous Systems
Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample the distribution over failures using a conditional denoising diffusion model, which has shown success in complex high-dimensional problems such as robotic task planning. We iteratively train a diffusion model to produce state trajectories closer to failure. We demonstrate the effectiveness of our approach on high-dimensional robotic validation tasks, improving sample efficiency and mode coverage compared to existing black-box techniques.
comment: Appears in IEEE International Conference on Engineering Reliable Autonomous Systems (ERAS) 2025
SG-Reg: Generalizable and Efficient Scene Graph Registration
This paper addresses the challenges of registering two rigid semantic scene graphs, an essential capability when an autonomous agent needs to register its map against a remote agent, or against a prior map. The hand-crafted descriptors in classical semantic-aided registration, or the ground-truth annotation reliance in learning-based scene graph registration, impede their application in practical real-world environments. To address the challenges, we design a scene graph network to encode multiple modalities of semantic nodes: open-set semantic feature, local topology with spatial awareness, and shape feature. These modalities are fused to create compact semantic node features. The matching layers then search for correspondences in a coarse-to-fine manner. In the back-end, we employ a robust pose estimator to decide transformation according to the correspondences. We manage to maintain a sparse and hierarchical scene representation. Our approach demands fewer GPU resources and fewer communication bandwidth in multi-agent tasks. Moreover, we design a new data generation approach using vision foundation models and a semantic mapping module to reconstruct semantic scene graphs. It differs significantly from previous works, which rely on ground-truth semantic annotations to generate data. We validate our method in a two-agent SLAM benchmark. It significantly outperforms the hand-crafted baseline in terms of registration success rate. Compared to visual loop closure networks, our method achieves a slightly higher registration recall while requiring only 52 KB of communication bandwidth for each query frame. Code available at: \href{http://github.com/HKUST-Aerial-Robotics/SG-Reg}{http://github.com/HKUST-Aerial-Robotics/SG-Reg}.
comment: IEEE Transactions Robotics Regular Paper
A Practical Guide for Incorporating Symmetry in Diffusion Policy
Recently, equivariant neural networks for policy learning have shown promising improvements in sample efficiency and generalization, however, their wide adoption faces substantial barriers due to implementation complexity. Equivariant architectures typically require specialized mathematical formulations and custom network design, posing significant challenges when integrating with modern policy frameworks like diffusion-based models. In this paper, we explore a number of straightforward and practical approaches to incorporate symmetry benefits into diffusion policies without the overhead of full equivariant designs. Specifically, we investigate (i) invariant representations via relative trajectory actions and eye-in-hand perception, (ii) integrating equivariant vision encoders, and (iii) symmetric feature extraction with pretrained encoders using Frame Averaging. We first prove that combining eye-in-hand perception with relative or delta action parameterization yields inherent SE(3)-invariance, thus improving policy generalization. We then perform a systematic experimental study on those design choices for integrating symmetry in diffusion policies, and conclude that an invariant representation with equivariant feature extraction significantly improves the policy performance. Our method achieves performance on par with or exceeding fully equivariant architectures while greatly simplifying implementation.
AC-LIO: Towards Asymptotic Compensation for Distortion in LiDAR-Inertial Odometry via Selective Intra-Frame Smoothing
Existing LiDAR-Inertial Odometry (LIO) methods typically utilize the prior trajectory derived from the IMU integration to compensate for the motion distortion within LiDAR frames. However, discrepancies between the prior and true trajectory can lead to residual motion distortions that compromise the consistency of LiDAR frame with its corresponding geometric environment. This imbalance may result in pointcloud registration becoming trapped in local optima, thereby exacerbating drift during long-term and large-scale localization. To this end, we propose a novel LIO framework with selective intra-frame smoothing dubbed AC-LIO. Our core idea is to asymptotically backpropagate current update term and compensate for residual motion distortion under the guidance of convergence criteria, aiming to improve the accuracy of discrete-state LIO system with minimal computational increase. Extensive experiments demonstrate that our AC-LIO framework further enhances odometry accuracy compared to prior arts, with about 30.4% reduction in average RMSE over the second best result, leading to marked improvements in the accuracy of long-term and large-scale localization and mapping.
comment: 10 pages, 9 figures
End-to-End and Highly-Efficient Differentiable Simulation for Robotics
Over the past few years, robotics simulators have largely improved in efficiency and scalability, enabling them to generate years of simulated data in a few hours. Yet, efficiently and accurately computing the simulation derivatives remains an open challenge, with potentially high gains on the convergence speed of reinforcement learning and trajectory optimization algorithms, especially for problems involving physical contact interactions. This paper contributes to this objective by introducing a unified and efficient algorithmic solution for computing the analytical derivatives of robotic simulators. The approach considers both the collision and frictional stages, accounting for their intrinsic nonsmoothness and also exploiting the sparsity induced by the underlying multibody systems. These derivatives have been implemented in C++, and the code will be open-sourced in the Simple simulator. They depict state-of-the-art timings ranging from 5 microseconds for a 7-dof manipulator up to 95 microseconds for 36-dof humanoid, outperforming alternative solutions by a factor of at least 100.
Safe Distributed Control of Multi-Robot Systems with Communication Delays
Safe operation of multi-robot systems is critical, especially in communication-degraded environments such as underwater for seabed mapping, underground caves for navigation, and in extraterrestrial missions for assembly and construction. We address safety of networked autonomous systems where the information exchanged between robots incurs communication delays. We formalize a notion of distributed control barrier function for multi-robot systems, a safety certificate amenable to a distributed implementation, which provides formal ground to using graph neural networks to learn safe distributed controllers. Further, we observe that learning a distributed controller ignoring delays can severely degrade safety. We finally propose a predictor-based framework to train a safe distributed controller under communication delays, where the current state of nearby robots is predicted from received data and age-of-information. Numerical experiments on multi-robot collision avoidance show that our predictor-based approach can significantly improve the safety of a learned distributed controller under communication delays. A video abstract is available at https://youtu.be/Hcu1Ri32Spk.
comment: Copyright (c) 2025 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org
GRoQ-Loco: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets
Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GRoQ-Loco, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors - stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) - collected across multiple quadruped robots, to train a generalist model that enables behavior fusion for both behaviors. Crucially, our framework operates directly on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate strong zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results highlight the potential for robust generalist locomotion across diverse robots and terrains.
comment: 18pages, 16figures, 6tables
Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images
Reliable object grasping is one of the fundamental tasks in robotics. However, determining grasping pose based on single-image input has long been a challenge due to limited visual information and the complexity of real-world objects. In this paper, we propose Triplane Grasping, a fast grasping decision-making method that relies solely on a single RGB-only image as input. Triplane Grasping creates a hybrid Triplane-Gaussian 3D representation through a point decoder and a triplane decoder, which produce an efficient and high-quality reconstruction of the object to be grasped to meet real-time grasping requirements. We propose to use an end-to-end network to generate 6-DoF parallel-jaw grasp distributions directly from 3D points in the point cloud as potential grasp contacts and anchor the grasp pose in the observed data. Experiments on the OmniObject3D and GraspNet-1Billion datasets demonstrate that our method achieves rapid modeling and grasping pose decision-making for daily objects, and strong generalization capability.
Task-oriented Robotic Manipulation with Vision Language Models
Vision Language Models (VLMs) play a crucial role in robotic manipulation by enabling robots to understand and interpret the visual properties of objects and their surroundings, allowing them to perform manipulation based on this multimodal understanding. Accurately understanding spatial relationships remains a non-trivial challenge, yet it is essential for effective robotic manipulation. In this work, we introduce a novel framework that integrates VLMs with a structured spatial reasoning pipeline to perform object manipulation based on high-level, task-oriented input. Our approach is the transformation of visual scenes into tree-structured representations that encode the spatial relations. These trees are subsequently processed by a Large Language Model (LLM) to infer restructured configurations that determine how these objects should be organised for a given high-level task. To support our framework, we also present a new dataset containing manually annotated captions that describe spatial relations among objects, along with object-level attribute annotations such as fragility, mass, material, and transparency. We demonstrate that our method not only improves the comprehension of spatial relationships among objects in the visual environment but also enables robots to interact with these objects more effectively. As a result, this approach significantly enhances spatial reasoning in robotic manipulation tasks. To our knowledge, this is the first method of its kind in the literature, offering a novel solution that allows robots to more efficiently organize and utilize objects in their surroundings.
A Probabilistic Model for Skill Acquisition with Switching Latent Feedback Controllers
Manipulation tasks often consist of subtasks, each representing a distinct skill. Mastering these skills is essential for robots, as it enhances their autonomy, efficiency, adaptability, and ability to work in their environment. Learning from demonstrations allows robots to rapidly acquire new skills without starting from scratch, with demonstrations typically sequencing skills to achieve tasks. Behaviour cloning approaches to learning from demonstration commonly rely on mixture density network output heads to predict robot actions. In this work, we first reinterpret the mixture density network as a library of feedback controllers (or skills) conditioned on latent states. This arises from the observation that a one-layer linear network is functionally equivalent to a classical feedback controller, with network weights corresponding to controller gains. We use this insight to derive a probabilistic graphical model that combines these elements, describing the skill acquisition process as segmentation in a latent space, where each skill policy functions as a feedback control law in this latent space. Our approach significantly improves not only task success rate, but also robustness to observation noise when trained with human demonstrations. Our physical robot experiments further show that the induced robustness improves model deployment on robots.
Mapless Collision-Free Flight via MPC using Dual KD-Trees in Cluttered Environments
Collision-free flight in cluttered environments is a critical capability for autonomous quadrotors. Traditional methods often rely on detailed 3D map construction, trajectory generation, and tracking. However, this cascade pipeline can introduce accumulated errors and computational delays, limiting flight agility and safety. In this paper, we propose a novel method for enabling collision-free flight in cluttered environments without explicitly constructing 3D maps or generating and tracking collision-free trajectories. Instead, we leverage Model Predictive Control (MPC) to directly produce safe actions from sparse waypoints and point clouds from a depth camera. These sparse waypoints are dynamically adjusted online based on nearby obstacles detected from point clouds. To achieve this, we introduce a dual KD-Tree mechanism: the Obstacle KD-Tree quickly identifies the nearest obstacle for avoidance, while the Edge KD-Tree provides a robust initial guess for the MPC solver, preventing it from getting stuck in local minima during obstacle avoidance. We validate our approach through extensive simulations and real-world experiments. The results show that our approach significantly outperforms the mapping-based methods and is also superior to imitation learning-based methods, demonstrating reliable obstacle avoidance at up to 12 m/s in simulations and 6 m/s in real-world tests. Our method provides a simple and robust alternative to existing methods.
Learning to Group and Grasp Multiple Objects
Simultaneously grasping and delivering multiple objects can significantly enhance robotic work efficiency and has been a key research focus for decades. The primary challenge lies in determining how to push objects, group them, and execute simultaneous grasping for respective groups while considering object distribution and the hardware constraints of the robot. Traditional rule-based methods struggle to flexibly adapt to diverse scenarios. To address this challenge, this paper proposes an imitation learning-based approach. We collect a series of expert demonstrations through teleoperation and train a diffusion policy network, enabling the robot to dynamically generate action sequences for pushing, grouping, and grasping, thereby facilitating efficient multi-object grasping and delivery. We conducted experiments to evaluate the method under different training dataset sizes, varying object quantities, and real-world object scenarios. The results demonstrate that the proposed approach can effectively and adaptively generate multi-object grouping and grasping strategies. With the support of more training data, imitation learning is expected to be an effective approach for solving the multi-object grasping problem.
RoCoDA: Counterfactual Data Augmentation for Data-Efficient Robot Learning from Demonstrations ICRA
Imitation learning in robotics faces significant challenges in generalization due to the complexity of robotic environments and the high cost of data collection. We introduce RoCoDA, a novel method that unifies the concepts of invariance, equivariance, and causality within a single framework to enhance data augmentation for imitation learning. RoCoDA leverages causal invariance by modifying task-irrelevant subsets of the environment state without affecting the policy's output. Simultaneously, we exploit SE(3) equivariance by applying rigid body transformations to object poses and adjusting corresponding actions to generate synthetic demonstrations. We validate RoCoDA through extensive experiments on five robotic manipulation tasks, demonstrating improvements in policy performance, generalization, and sample efficiency compared to state-of-the-art data augmentation methods. Our policies exhibit robust generalization to unseen object poses, textures, and the presence of distractors. Furthermore, we observe emergent behavior such as re-grasping, indicating policies trained with RoCoDA possess a deeper understanding of task dynamics. By leveraging invariance, equivariance, and causality, RoCoDA provides a principled approach to data augmentation in imitation learning, bridging the gap between geometric symmetries and causal reasoning. Project Page: https://rocoda.github.io
comment: Accepted to 2025 IEEE International Conference on Robotics and Automation (ICRA)
The Role of Integrity Monitoring in Connected and Automated Vehicles: Current State-of-Practice and Future Directions
Positioning integrity refers to the trust in the performance of a navigation system. Accurate and reliable position information is needed to meet the requirements of connected and Automated Vehicle (CAV) applications, particularly in safety-critical scenarios. Receiver Autonomous Integrity Monitoring (RAIM) and its variants have been widely studied for Global Navigation Satellite System (GNSS)-based vehicle positioning, often fused with kinematic (e.g., Odometry) and perception sensors (e.g., camera). However, integrity monitoring (IM) for cooperative positioning solutions leveraging Vehicle-to-Everything (V2X) communication has received comparatively limited attention. This paper reviews existing research in the field of positioning IM and identifies various research gaps. Particular attention has been placed on identifying research that highlights cooperative IM methods. It also examines key automotive safety standards and public V2X datasets to map current research priorities and uncover critical gaps. Finally, the paper outlines promising future directions, highlighting research topics aimed at advancing and benchmarking positioning integrity.
comment: This work has been submitted to the IEEE for possible publication
Multiagent Systems
Multi-agent Reinforcement Learning vs. Fixed-Time Control for Traffic Signal Optimization: A Simulation Study
Urban traffic congestion, particularly at intersections, significantly impacts travel time, fuel consumption, and emissions. Traditional fixed-time signal control systems often lack the adaptability to manage dynamic traffic patterns effectively. This study explores the application of multi-agent reinforcement learning (MARL) to optimize traffic signal coordination across multiple intersections within a simulated environment. Utilizing Pygame, a simulation was developed to model a network of interconnected intersections with randomly generated vehicle flows to reflect realistic traffic variability. A decentralized MARL controller was implemented, in which each traffic signal operates as an autonomous agent, making decisions based on local observations and information from neighboring agents. Performance was evaluated against a baseline fixed-time controller using metrics such as average vehicle wait time and overall throughput. The MARL approach demonstrated statistically significant improvements, including reduced average waiting times and improved throughput. These findings suggest that MARL-based dynamic control strategies hold substantial promise for improving urban traffic management efficiency. More research is recommended to address scalability and real-world implementation challenges.
Upgrading Democracies with Fairer Voting Methods
Voting methods are instrumental design element of democracies. Citizens use them to express and aggregate their preferences to reach a collective decision. However, voting outcomes can be as sensitive to voting rules as they are to people's voting choices. Despite the significance and inter-disciplinary scientific progress on voting methods, several democracies keep relying on outdated voting methods that do not fit modern, pluralistic societies well, while lacking social innovation. Here, we demonstrate how one can upgrade real-world democracies, namely by using alternative preferential voting methods such as cumulative voting and the method of equal shares designed for a proportional representation of voters' preferences. By rigorously assessing a new participatory budgeting approach applied in the city of Aarau, Switzerland, we unravel the striking voting outcomes of fair voting methods: more winning projects with the same budget and broader geographic and preference representation of citizens by the elected projects, in particular for voters who used to be under-represented, while promoting novel project ideas. We provide profound causal evidence showing that citizens prefer proportional voting methods, which possess strong legitimacy without the need of very technical specialized explanations. We also reveal strong underlying democratic values exhibited by citizens who support fair voting methods such as altruism and compromise. These findings come with a global momentum to unleash a new and long-awaited participation blueprint of how to upgrade democracies.
comment: Includes Supplementary Information
Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy
Task-oriented dialogue systems based on Large Language Models (LLMs) have gained increasing attention across various industries and achieved significant results. Current approaches condense complex procedural workflows into a single agent to achieve satisfactory performance on large-scale LLMs. However, these approaches face challenges to achieve comparable performance on fine-tuned lightweight LLMs, due to their limited capabilities in handling multiple complex logic. In this work, we design a Domain-Independent Multi-Agent Framework (DIMF), which contains Intent Classification Agent, Slot Filling Agent and Response Agent. This approach simplifies the learning complexity and enhances the generalization ability by separating the tasks into domain-independent components. In this framework, we enhance the capabilities in contextual understanding using the Direct Preference Optimisation (DPO) method, and propose a simple and effective Data Distribution Adaptation (DDA) method to mitigate degradation issues during DPO training. Experiments conducted on the MultiWOZ datasets show that our proposed method achieves a better average performance among all the baselines. Extensive analysis also demonstrates that our proposed framework exhibits excellent generalizability and zero-shot capability.
MAS-KCL: Knowledge component graph structure learning with large language model-based agentic workflow
Knowledge components (KCs) are the fundamental units of knowledge in the field of education. A KC graph illustrates the relationships and dependencies between KCs. An accurate KC graph can assist educators in identifying the root causes of learners' poor performance on specific KCs, thereby enabling targeted instructional interventions. To achieve this, we have developed a KC graph structure learning algorithm, named MAS-KCL, which employs a multi-agent system driven by large language models for adaptive modification and optimization of the KC graph. Additionally, a bidirectional feedback mechanism is integrated into the algorithm, where AI agents leverage this mechanism to assess the value of edges within the KC graph and adjust the distribution of generation probabilities for different edges, thereby accelerating the efficiency of structure learning. We applied the proposed algorithm to 5 synthetic datasets and 4 real-world educational datasets, and experimental results validate its effectiveness in learning path recognition. By accurately identifying learners' learning paths, teachers are able to design more comprehensive learning plans, enabling learners to achieve their educational goals more effectively, thus promoting the sustainable development of education.
comment: In CGI 2025: 42nd Computer Graphics International Conference, Kowloon, Hong Kong, Peper No. 134
Personalized and Resilient Distributed Learning Through Opinion Dynamics
In this paper, we address two practical challenges of distributed learning in multi-agent network systems, namely personalization and resilience. Personalization is the need of heterogeneous agents to learn local models tailored to their own data and tasks, while still generalizing well; on the other hand, the learning process must be resilient to cyberattacks or anomalous training data to avoid disruption. Motivated by a conceptual affinity between these two requirements, we devise a distributed learning algorithm that combines distributed gradient descent and the Friedkin-Johnsen model of opinion dynamics to fulfill both of them. We quantify its convergence speed and the neighborhood that contains the final learned models, which can be easily controlled by tuning the algorithm parameters to enforce a more personalized/resilient behavior. We numerically showcase the effectiveness of our algorithm on synthetic and real-world distributed learning tasks, where it achieves high global accuracy both for personalized models and with malicious agents compared to standard strategies.
comment: This work has been submitted to IEEE for possible publication
Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning
Retrieval-Augmented Generation (RAG) systems empower large language models (LLMs) with external knowledge, yet struggle with efficiency-accuracy trade-offs when scaling to large knowledge graphs. Existing approaches often rely on monolithic graph retrieval, incurring unnecessary latency for simple queries and fragmented reasoning for complex multi-hop questions. To address these challenges, this paper propose SPLIT-RAG, a multi-agent RAG framework that addresses these limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval. The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG. The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types, while lightweight LLM agents are assigned to partitioned subgraphs, and only relevant partitions are activated during retrieval, thus reduce search space while enhancing efficiency. Finally, a hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications. Extensive experimental validation demonstrates considerable improvements compared to existing approaches.
comment: 20 pages, 4 figures
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation
Existing AutoML systems have advanced the automation of machine learning (ML); however, they still require substantial manual configuration and expert input, particularly when handling multimodal data. We introduce MLZero, a novel multi-agent framework powered by Large Language Models (LLMs) that enables end-to-end ML automation across diverse data modalities with minimal human intervention. A cognitive perception module is first employed, transforming raw multimodal inputs into perceptual context that effectively guides the subsequent workflow. To address key limitations of LLMs, such as hallucinated code generation and outdated API knowledge, we enhance the iterative code generation process with semantic and episodic memory. MLZero demonstrates superior performance on MLE-Bench Lite, outperforming all competitors in both success rate and solution quality, securing six gold medals. Additionally, when evaluated on our Multimodal AutoML Agent Benchmark, which includes 25 more challenging tasks spanning diverse data modalities, MLZero outperforms the competing methods by a large margin with a success rate of 0.92 (+263.6\%) and an average rank of 2.28. Our approach maintains its robust effectiveness even with a compact 8B LLM, outperforming full-size systems from existing solutions.
MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation
We present MAATS, a Multi Agent Automated Translation System that leverages the Multidimensional Quality Metrics (MQM) framework as a fine-grained signal for error detection and refinement. MAATS employs multiple specialized AI agents, each focused on a distinct MQM category (e.g., Accuracy, Fluency, Style, Terminology), followed by a synthesis agent that integrates the annotations to iteratively refine translations. This design contrasts with conventional single-agent methods that rely on self-correction. Evaluated across diverse language pairs and Large Language Models (LLMs), MAATS outperforms zero-shot and single-agent baselines with statistically significant gains in both automatic metrics and human assessments. It excels particularly in semantic accuracy, locale adaptation, and linguistically distant language pairs. Qualitative analysis highlights its strengths in multi-layered error diagnosis, omission detection across perspectives, and context-aware refinement. By aligning modular agent roles with interpretable MQM dimensions, MAATS narrows the gap between black-box LLMs and human translation workflows, shifting focus from surface fluency to deeper semantic and contextual fidelity.
Cooperative Bargaining Games Without Utilities: Mediated Solutions from Direction Oracles
Cooperative bargaining games are widely used to model resource allocation and conflict resolution. Traditional solutions assume the mediator can access agents utility function values and gradients. However, there is an increasing number of settings, such as human AI interactions, where utility values may be inaccessible or incomparable due to unknown, nonaffine transformations. To model such settings, we consider that the mediator has access only to agents most preferred directions, i.e., normalized utility gradients in the decision space. To this end, we propose a cooperative bargaining algorithm where a mediator has access to only the direction oracle of each agent. We prove that unlike popular approaches such as the Nash and Kalai Smorodinsky bargaining solutions, our approach is invariant to monotonic nonaffine transformations, and that under strong convexity and smoothness assumptions, this approach enjoys global asymptotic convergence to Pareto stationary solutions. Moreover, we show that the bargaining solutions found by our algorithm also satisfy the axioms of symmetry and (under slightly stronger conditions) independence of irrelevant alternatives, which are popular in the literature. Finally, we conduct experiments in two domains, multi agent formation assignment and mediated stock portfolio allocation, which validate these theoretic results. All code for our experiments can be found at https://github.com/suryakmurthy/dibs_bargaining.
Integration of TinyML and LargeML: A Survey of 6G and Beyond
The transition from 5G networks to 6G highlights a significant demand for machine learning (ML). Deep learning models, in particular, have seen wide application in mobile networking and communications to support advanced services in emerging wireless environments, such as smart healthcare, smart grids, autonomous vehicles, aerial platforms, digital twins, and the metaverse. The rapid expansion of Internet-of-Things (IoT) devices, many with limited computational capabilities, has accelerated the development of tiny machine learning (TinyML) and resource-efficient ML approaches for cost-effective services. However, the deployment of large-scale machine learning (LargeML) solutions require major computing resources and complex management strategies to support extensive IoT services and ML-generated content applications. Consequently, the integration of TinyML and LargeML is projected as a promising approach for future seamless connectivity and efficient resource management. Although the integration of TinyML and LargeML shows abundant potential, several challenges persist, including performance optimization, practical deployment strategies, effective resource management, and security considerations. In this survey, we review and analyze the latest research aimed at enabling the integration of TinyML and LargeML models for the realization of smart services and applications in future 6G networks and beyond. The paper concludes by outlining critical challenges and identifying future research directions for the holistic integration of TinyML and LargeML in next-generation wireless networks.
comment: This work was submitted to IEEE Communications Surveys & Tutorials
IG Parser: A Software Package for the Encoding of Institutional Statements using the Institutional Grammar
This article provides an overview of IG Parser, a software that facilitates qualitative content analysis of formal (e.g., legal) rules or informal (e.g., social) norms, and strategies (such as conventions) -- referred to as institutions -- that govern social systems and operate configurally to describe institutional systems. To this end, the IG Parser employs a distinctive syntax that ensures rigorous encoding of natural language, while automating the transformation into various formats that support the downstream analysis using diverse analytical techniques. The conceptual core of the IG Parser is an associated syntax, IG Script, that operationalizes the conceptual foundations of the Institutional Grammar, and more specifically the Institutional Grammar 2.0, an analytical paradigm for institutional analysis. This article presents the IG Parser, including its conceptual foundations, the syntax specification of IG Script, and its architectural principles. This overview is augmented with selective illustrative examples that highlight its use and the associated benefits.
comment: 24 pages
Adaptive Inference through Bayesian and Inverse Bayesian Inference with Symmetry-Bias in Nonstationary Environments
This study introduces a novel inference framework, designated as Bayesian and inverse Bayesian (BIB) inference, which concurrently performs both conventional and inverse Bayesian updates by integrating symmetry bias into Bayesian inference. The effectiveness of the model was evaluated through a sequential estimation task involving observations sampled from a Gaussian distribution with a stochastically time-varying mean. Conventional Bayesian inference entails a fundamental trade-off between adaptability to abrupt environmental shifts and estimation accuracy during stable intervals. The BIB framework addresses this limitation by dynamically modulating the learning rate through inverse Bayesian updates, thereby enhancing adaptive flexibility. The BIB model generated spontaneous bursts in the learning rate during sudden environmental transitions, transiently entering a high-sensitivity state to accommodate incoming data. This intermittent burst-relaxation pattern functions as a dynamic mechanism that balances adaptability and accuracy. Further analysis of burst interval distributions demonstrated that the BIB model consistently produced power-law distributions under diverse conditions. Such robust scaling behavior, absent in conventional Bayesian inference, appears to emerge from a self-regulatory mechanism driven by inverse Bayesian updates. These results present a novel computational perspective on scale-free phenomena in natural systems and offer implications for designing adaptive inference systems in nonstationary environments.
Steady-State Strategy Synthesis for Swarms of Autonomous Agents
Steady-state synthesis aims to construct a policy for a given MDP $D$ such that the long-run average frequencies of visits to the vertices of $D$ satisfy given numerical constraints. This problem is solvable in polynomial time, and memoryless policies are sufficient for approximating an arbitrary frequency vector achievable by a general (infinite-memory) policy. We study the steady-state synthesis problem for multiagent systems, where multiple autonomous agents jointly strive to achieve a suitable frequency vector. We show that the problem for multiple agents is computationally hard (PSPACE or NP hard, depending on the variant), and memoryless strategy profiles are insufficient for approximating achievable frequency vectors. Furthermore, we prove that even evaluating the frequency vector achieved by a given memoryless profile is computationally hard. This reveals a severe barrier to constructing an efficient synthesis algorithm, even for memoryless profiles. Nevertheless, we design an efficient and scalable synthesis algorithm for a subclass of full memoryless profiles, and we evaluate this algorithm on a large class of randomly generated instances. The experimental results demonstrate a significant improvement against a naive algorithm based on strategy sharing.
Attention Mechanism for LLM-based Agents Dynamic Diffusion under Information Asymmetry
Large language models have been used to simulate human society using multi-agent systems. Most current social simulation research emphasizes interactive behaviors in fixed environments, ignoring information opacity, relationship variability, and diffusion diversity. In this paper, we first propose a general framework for exploring multi-agent information diffusion. We identified LLMs' deficiency in the perception and utilization of social relationships, as well as diverse actions. Then, we designed a dynamic attention mechanism to help agents allocate attention to different information, addressing the limitations of the LLM attention mechanism. Agents start by responding to external information stimuli within a five-agent group, increasing group size and forming information circles while developing relationships and sharing information. Additionally, we explore the information diffusion features in the asymmetric open environment by observing the evolution of information gaps, diffusion patterns, and the accumulation of social capital, which are closely linked to psychological, sociological, and communication theories.
comment: 18 pages, 5 figures
Factorised Active Inference for Strategic Multi-Agent Interactions AAMAS-2025
Understanding how individual agents make strategic decisions within collectives is important for advancing fields as diverse as economics, neuroscience, and multi-agent systems. Two complementary approaches can be integrated to this end. The Active Inference framework (AIF) describes how agents employ a generative model to adapt their beliefs about and behaviour within their environment. Game theory formalises strategic interactions between agents with potentially competing objectives. To bridge the gap between the two, we propose a factorisation of the generative model whereby each agent maintains explicit, individual-level beliefs about the internal states of other agents, and uses them for strategic planning in a joint context. We apply our model to iterated general-sum games with two and three players, and study the ensemble effects of game transitions, where the agents' preferences (game payoffs) change over time. This non-stationarity, beyond that caused by reciprocal adaptation, reflects a more naturalistic environment in which agents need to adapt to changing social contexts. Finally, we present a dynamical analysis of key AIF quantities: the variational free energy (VFE) and the expected free energy (EFE) from numerical simulation data. The ensemble-level EFE allows us to characterise the basins of attraction of games with multiple Nash Equilibria under different conditions, and we find that it is not necessarily minimised at the aggregate level. By integrating AIF and game theory, we can gain deeper insights into how intelligent collectives emerge, learn, and optimise their actions in dynamic environments, both cooperative and non-cooperative.
comment: To appear in Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-2025). Detroit, USA, May 2025
On Signed Network Coordination Games
We study binary-action pairwise-separable network games that encompass both coordinating and anti-coordinating behaviors. Our model is grounded in an underlying directed signed graph, where each link is associated with a weight that describes the strenght and nature of the interaction. The utility for each agent is an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent cohesive subset of players, who are either connected exclusively by positive weights, or forms a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Given the properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by a consensus or, respectively, a polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the supermodular properties of coordination games and on a novel use of the concept of graph cohesiveness.
comment: 14 pages, 8 figures
An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning
Multi-agent reinforcement learning (MARL) has exploded in popularity in recent years. While numerous approaches have been developed, they can be broadly categorized into three main types: centralized training and execution (CTE), centralized training for decentralized execution (CTDE), and decentralized training and execution (DTE). CTE methods assume centralization during training and execution (e.g., with fast, free, and perfect communication) and have the most information during execution. CTDE methods are the most common, as they leverage centralized information during training while enabling decentralized execution -- using only information available to that agent during execution. Decentralized training and execution methods make the fewest assumptions and are often simple to implement. This text is an introduction to cooperative MARL -- MARL in which all agents share a single, joint reward. It is meant to explain the setting, basic concepts, and common methods for the CTE, CTDE, and DTE settings. It does not cover all work in cooperative MARL as the area is quite extensive. I have included work that I believe is important for understanding the main concepts in the area and apologize to those that I have omitted. Topics include simple applications of single-agent methods to CTE as well as some more scalable methods that exploit the multi-agent structure, independent Q-learning and policy gradient methods and their extensions, as well as value function factorization methods including the well-known VDN, QMIX, and QPLEX approaches, and centralized critic methods including MADDPG, COMA, and MAPPO. I also discuss common misconceptions, the relationship between different approaches, and some open questions.
Systems and Control (CS)
Sequential QCQP for Bilevel Optimization with Line Search
Bilevel optimization involves a hierarchical structure where one problem is nested within another, leading to complex interdependencies between levels. We propose a single-loop, tuning-free algorithm that guarantees anytime feasibility, i.e., approximate satisfaction of the lower-level optimality condition, while ensuring descent of the upper-level objective. At each iteration, a convex quadratically-constrained quadratic program (QCQP) with a closed-form solution yields the search direction, followed by a backtracking line search inspired by control barrier functions to ensure safe, uniformly positive step sizes. The resulting method is scalable, requires no hyperparameter tuning, and converges under mild local regularity assumptions. We establish an O(1/k) ergodic convergence rate and demonstrate the algorithm's effectiveness on representative bilevel tasks.
comment: Under Review
Development of a Scaled Setup for Experimental Study of the Effect of Lateral Dynamics on Energy Consumption in Electric Vehicles: An Extension
Most of the existing state-of-the-art approaches for energy consumption analysis do not account for the effect of lateral dynamics on energy consumption in electric vehicles (EVs) during vehicle maneuvers. This paper aims to validate this effect through an experimental study. We develop a scaled model using a radio-controlled (RC) car, modified to achieve dynamic similitude with on-road vehicles, to conduct scaled experiments. The experimental results confirm the impact of lateral dynamics on both energy demand and driving range in electric vehicles, aligning with our previous findings [1], and emphasize the need to incorporate these factors into energy consumption models.
Comparison of Data-Driven Modeling Approaches for Control Optimization of Floating Offshore Wind Turbines
Models that balance accuracy against computational costs are advantageous when designing wind turbines with optimization studies, as several hundred predictive function evaluations might be necessary to identify the optimal solution. We explore different approaches to construct low-fidelity models that can be used to approximate dynamic quantities and be used as surrogates for design optimization studies and other use cases. In particular, low-fidelity modeling approaches using classical systems identification and deep learning approaches are considered against derivative function surrogate models ({DFSMs}), or approximate models of the state derivative function. This work proposes a novel method that utilizes a linear parameter varying (LPV) modeling scheme to construct the DFSM. We compare the trade-offs between these different models and explore the efficacy of the proposed DFSM approach in approximating wind turbine performance and design optimization studies for controllers. Results show that the proposed DFSM approach balances computational time and model accuracy better than the system identification and deep learning based models. Additionally, the DFSM provides nearly a fifty times speed-up compared to the high-fidelity model, while balancing accuracy.
comment: 16 pages, 15 figures
Security of Distributed Gradient Descent Against Byzantine Agents
This paper investigates the security of Decentralized Gradient Descent (DGD) algorithms on undirected graphs. Specifically, we consider Byzantine agents that inject stealthy attacks to degrade the performance of the DGD algorithm in terms of its optimal value. To mitigate the effect of stealthy attacks, some agents are allocated to monitor the evolution of their gradient. We then propose a method to quantify the maximum deviation caused by the Byzantine agent in the optimal value of the DGD. Our approach serves as a security metric to evaluate the robustness of graph structures against Byzantine attacks. We validate our findings through numerical simulations.
Efficient Configuration-Constrained Tube MPC via Variables Restriction and Template Selection
Configuration-Constrained Tube Model Predictive Control (CCTMPC) offers flexibility by using a polytopic parameterization of invariant sets and the optimization of an associated vertex control law. This flexibility, however, often demands computational trade-offs between set parameterization accuracy and optimization complexity. This paper proposes two innovations that help the user tackle this trade-off. First, a structured framework is proposed, which strategically limits optimization degrees of freedom, significantly reducing online computation time while retaining stability guarantees. This framework aligns with Homothetic Tube MPC (HTMPC) under maximal constraints. Second, a template refinement algorithm that iteratively solves quadratic programs is introduced to balance polytope complexity and conservatism. Simulation studies on an illustrative benchmark problem as well as a high-dimensional ten-state system demonstrate the approach's efficiency, achieving robust performance with minimal computational overhead. The results validate a practical pathway to leveraging CCTMPC's adaptability without sacrificing real-time viability.
comment: 10 pages, submitted to CDC 2025 conference
A system identification approach to clustering vector autoregressive time series
Clustering of time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling. Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction, where the autocorrelation pattern/feature is mostly ignored. Instead of relying on heuristic feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics. We first derive a clustering algorithm based on a mixture autoregressive model. Unfortunately it turns out to have significant computational problems. We then derive a `small-noise' limiting version of the algorithm, which we call k-LMVAR (Limiting Mixture Vector AutoRegression), that is computationally manageable. We develop an associated BIC criterion for choosing the number of clusters and model order. The algorithm performs very well in comparative simulations and also scales well computationally.
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations
This paper introduces a data-driven (i.e., model-free) approach to identify which inverter-based resources (IBRs) have dominant participation in poorly damped sub-synchronous oscillations (SSO), to get to the root cause for effective mitigation. An Enhanced Dynamic Mode Decomposition (eDMD) method is proposed that incorporates an appropriate set of observables. Based on time-synchronized data (either simulated or real) from IBR connection points, eDMD directly computes data-driven eigenvectors and participation factors to reveal the role of each IBR in poorly damped SSO. We show the improved accuracy of eDMD over conventional Dynamic Mode Decomposition (DMD) by benchmarking both against actual model-based analysis. We demonstrate this first through a synthetic example and then a case study on the IEEE 39-bus test system with 100% IBR. This data-driven, model-free method offers a powerful tool to foresee and mitigate the risk of IBR-induced SSO in planning (simulated data) and post-event analysis (real data) of SSO events.
comment: 10 pages, 14 figures, Journal paper.Submitted to IEEE Transactions on Power System
Functional Controllability, Functional Stabilisability, and the Generalised Separation Principle
This paper introduces the new concepts of Functional Controllability and Functional Stabilisability, and establishes their duality with Functional Observability and Functional Detectability, respectively. We further present a Generalised Separation Principle, demonstrating that the classical Separation Principle emerges as a special case. Conditions for the existence of functional controllers of a specified order are derived. Importantly, the design framework does not require full controllability. Furthermore, we develop a functional observer-based controller design applicable to systems that are both uncontrollable and unobservable. The results presented generalise the classical full-state feedback control paradigm.
comment: Under review in a journal
Statistically Optimal Structured Additive MIMO Continuous-time System Identification
Many applications in mechanical, acoustic, and electronic engineering require estimating complex dynamical models, often represented as additive multi-input multi-output (MIMO) transfer functions with structural constraints. This paper introduces a two-stage procedure for estimating structured additive MIMO models, where structural constraints are enforced through a weighted nonlinear least-squares projection of the parameter vector initially estimated using a recently developed refined instrumental variables algorithm. The proposed approach is shown to be consistent and asymptotically efficient in open-loop scenarios. In closed-loop settings, it remains consistent despite potential noise model misspecification and achieves minimum covariance among all instrumental variable estimators. Extensive simulations are performed to validate the theoretical findings, and to show the efficacy of the proposed approach.
comment: 15 pages, 5 figures
Gaming Strategies in European Imbalance Settlement Mechanisms
Transmission System Operators (TSOs) rely on balancing energy provided by Balancing Service Providers (BSPs) to maintain the supply-demand balance in real time. Balance Responsible Parties (BRPs) can simultaneously deviate from their day-ahead schedules in response to imbalance prices, e.g., by controlling flexible assets such as batteries. According to the European Electricity Balancing Guideline, these imbalance prices should incentivize BRPs performing such implicit or passive balancing to aid the TSO in restoring the energy balance. In this paper, we demonstrate that BRPs are unintentionally offered the opportunity to exploit gaming strategies in European imbalance settlement mechanisms. This is enabled by a disconnect between sub-quarter-hourly dynamics that determine the imbalance prices and the financial settlement on a quarter-hourly basis. We illustrate this behavior in a case study of the imbalance settlement mechanisms in Belgium and the Netherlands. Our results reveal that, in both countries, BRPs can, in theory, exploit the imbalance mechanism by increasing the instantaneous system imbalance during minutes within the quarter-hour that determine the imbalance price while still contributing to restoring the system balance for the rest of the quarter-hour.
VeRecycle: Reclaiming Guarantees from Probabilistic Certificates for Stochastic Dynamical Systems after Change IJCAI 2025
Autonomous systems operating in the real world encounter a range of uncertainties. Probabilistic neural Lyapunov certification is a powerful approach to proving safety of nonlinear stochastic dynamical systems. When faced with changes beyond the modeled uncertainties, e.g., unidentified obstacles, probabilistic certificates must be transferred to the new system dynamics. However, even when the changes are localized in a known part of the state space, state-of-the-art requires complete re-certification, which is particularly costly for neural certificates. We introduce VeRecycle, the first framework to formally reclaim guarantees for discrete-time stochastic dynamical systems. VeRecycle efficiently reuses probabilistic certificates when the system dynamics deviate only in a given subset of states. We present a general theoretical justification and algorithmic implementation. Our experimental evaluation shows scenarios where VeRecycle both saves significant computational effort and achieves competitive probabilistic guarantees in compositional neural control.
comment: accepted to IJCAI 2025
Symbolic and Numerical Tools for $L_{\infty}$-Norm Calculation
The computation of the $L_\infty $-norm is an important issue in $H_{\infty}$ control, particularly for analyzing system stability and robustness. This paper focuses on symbolic computation methods for determining the $L_{\infty} $-norm of finite-dimensional linear systems, highlighting their advantages in achieving exact solutions where numerical methods often encounter limitations. Key techniques such as Sturm-Habicht sequences, Rational Univariate Representations (RUR), and Cylindrical Algebraic Decomposition (CAD) are surveyed, with an emphasis on their theoretical foundations, practical implementations, and specific applicability to $ L_{\infty} $-norm computation. A comparative analysis is conducted between symbolic and conventional numerical approaches, underscoring scenarios in which symbolic computation provides superior accuracy, particularly in parametric cases. Benchmark evaluations reveal the strengths and limitations of both approaches, offering insights into the trade-offs involved. Finally, the discussion addresses the challenges of symbolic computation and explores future opportunities for its integration into control theory, particularly for robust and stable system analysis.
On the Input-Output Monotonicity of Voltage Dynamics of Power System with Grid-Forming Converters
Integration of renewable resources is profoundly reshaping the dynamics of modern power systems. This study shows that the voltage dynamics of power systems with multiple grid-forming (GFM) converters often enjoys a desirable property called input-output monotonicity. A systematic approach for computing the derivatives of the voltage subsystem is presented first, which provides insight into the structural characteristics of these models. Next, the sign pattern of the trajectory Jacobian matrix associated with the voltage subsystem is analyzed and revealed. The analysis indicates that the voltage dynamics of power systems often exhibits the so-called input-output monotonicity property. The theoretical results are then validated through several simulation examples, underscoring their practical implications.
C*: A Coverage Path Planning Algorithm for Unknown Environments using Rapidly Covering Graphs
The paper presents a novel sample-based algorithm, called C*, for real-time coverage path planning (CPP) of unknown environments. The C* algorithm is built upon the concept of Rapidly Covering Graph (RCGs). The RCG is constructed incrementally via progressive sampling during robot navigation, which eliminates the need for cellular decomposition of the search space. The RCG has a sparse-graph structure formed by efficient sampling and pruning techniques, which produces non-myopic waypoints of the coverage trajectory. While C* produces the desired back and forth coverage pattern, it adapts to the TSP-based locally optimal coverage of small uncovered regions, called coverage holes, that are surrounded by obstacles and covered regions. Thus, C* proactively detects and covers the coverage holes in situ, which reduces the coverage time by preventing the longer return trajectories from distant regions to cover such holes later. The algorithmic simplicity and low computational complexity of C* makes it easy to implement and suitable for real-time onboard applications. It is analytically proven that C* provides complete coverage of unknown environments. The performance of C* is validated by 1) extensive high-fidelity simulations and 2) real laboratory experiments using autonomous robots. A comparative evaluation with seven existing CPP methods demonstrate that C* yields significant performance improvements in terms of coverage time, number of turns, trajectory length and overlap ratio, while preventing the formation of coverage holes. Finally, C* is evaluated on two different applications of CPP using 1) energy-constrained robots and 2) multi-robot teams.
Customized Interior-Point Methods Solver for Embedded Real-Time Convex Optimization
This paper presents a customized convex optimization solver tailored for embedded real-time optimization, which frequently arise in modern guidance and control (G&C) applications. The solver employs a practically efficient predictor-corrector type primal-dual interior-point method (PDIPM) combined with a homogeneous embedding framework for infeasibility detection. Unlike conventional homogeneous self-dual embedding formulations, the adopted approach can directly handle quadratic cost functions without requiring problem reformulation. To support a systematic workflow, we also develop a code generation tool that analyzes the sparsity pattern of the provided problem family and generates customized solver code using a predefined code template. The generated solver code is written in C with no external dependencies other than the standard library math.h, and it supports complete static allocation of all data. Additionally, it provides parsing information to facilitate the use of the solver API by end users. Benchmark results and numerical experiments on an embedded platform demonstrate that the developed solver outperforms existing solvers in both efficiency and reliability.
SecCAN: An Extended CAN Controller with Embedded Intrusion Detection
Recent research has highlighted the vulnerability of in-vehicle network protocols such as controller area networks (CAN) and proposed machine learning-based intrusion detection systems (IDSs) as an effective mitigation technique. However, their efficient integration into vehicular architecture is non-trivial, with existing methods relying on electronic control units (ECUs)-coupled IDS accelerators or dedicated ECUs as IDS accelerators. Here, initiating IDS requires complete reception of a CAN message from the controller, incurring data movement and software overheads. In this paper, we present SecCAN, a novel CAN controller architecture that embeds IDS capability within the datapath of the controller. This integration allows IDS to tap messages directly from within the CAN controller as they are received from the bus, removing overheads incurred by existing ML-based IDSs. A custom-quantised machine-learning accelerator is developed as the IDS engine and embedded into SecCAN's receive data path, with optimisations to overlap the IDS inference with the protocol's reception window. We implement SecCAN on AMD XCZU7EV FPGA to quantify its performance and benefits in hardware, using multiple attack datasets. We show that SecCAN can completely hide the IDS latency within the CAN reception window for all CAN packet sizes and detect multiple attacks with state-of-the-art accuracy with zero software overheads on the ECU and low energy overhead (73.7 uJ per message) for IDS inference. Also, SecCAN incurs limited resource overhead compared to a standard CAN controller (< 30% LUT, < 1% FF), making it ideally suited for automotive deployment.
comment: 4 pages, 3 figures, 3 tables, Accepted in IEEE Embedded Systems Letters (https://ieee-ceda.org/publication/esl)
Understanding 6G through Language Models: A Case Study on LLM-aided Structured Entity Extraction in Telecom Domain
Knowledge understanding is a foundational part of envisioned 6G networks to advance network intelligence and AI-native network architectures. In this paradigm, information extraction plays a pivotal role in transforming fragmented telecom knowledge into well-structured formats, empowering diverse AI models to better understand network terminologies. This work proposes a novel language model-based information extraction technique, aiming to extract structured entities from the telecom context. The proposed telecom structured entity extraction (TeleSEE) technique applies a token-efficient representation method to predict entity types and attribute keys, aiming to save the number of output tokens and improve prediction accuracy. Meanwhile, TeleSEE involves a hierarchical parallel decoding method, improving the standard encoder-decoder architecture by integrating additional prompting and decoding strategies into entity extraction tasks. In addition, to better evaluate the performance of the proposed technique in the telecom domain, we further designed a dataset named 6GTech, including 2390 sentences and 23747 words from more than 100 6G-related technical publications. Finally, the experiment shows that the proposed TeleSEE method achieves higher accuracy than other baseline techniques, and also presents 5 to 9 times higher sample processing speed.
Learning POMDPs with Linear Function Approximation and Finite Memory
We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback policies. We provide error bounds derived from filter stability and projection errors. We then study the learning of finite-memory based near-optimal Q values. Convergence in this case requires further assumptions on the exploration policy when using general basis functions. We then show that these assumptions can be relaxed for specific models such as those with perfectly linear cost and dynamics, or when using discretization based basis functions.
Virtual Fluoroscopy for Interventional Guidance using Magnetic Tracking
Purpose: In conventional fluoroscopy-guided interventions, the 2D projective nature of X-ray imaging limits depth perception and leads to prolonged radiation exposure. Virtual fluoroscopy, combined with spatially tracked surgical instruments, is a promising strategy to mitigate these limitations. While magnetic tracking shows unique advantages, particularly in tracking flexible instruments, it remains under-explored due to interference from ferromagnetic materials in the C-arm room. This work proposes a virtual fluoroscopy workflow by effectively integrating magnetic tracking, and demonstrates its clinical efficacy. Methods: An automatic virtual fluoroscopy workflow was developed using a radiolucent tabletop field generator prototype. Specifically, we developed a fluoro-CT registration approach with automatic 2D-3D shared landmark correspondence to establish the C-arm-patient relationship, along with a general C-arm modelling approach to calculate desired poses and generate corresponding virtual fluoroscopic images. Results: Testing on a dataset with views ranging from RAO 90 degrees to LAO 90 degrees, simulated fluoroscopic images showed visually imperceptible differences from the real ones, achieving a mean target projection distance error of 1.55 mm. An endoleak phantom insertion experiment highlighted the effectiveness of simulating multiplanar views with real-time instrument overlays, achieving a mean needle tip error of 3.42 mm. Conclusions: Results demonstrated the efficacy of virtual fluoroscopy integrated with magnetic tracking, improving depth perception during navigation. The broad capture range of virtual fluoroscopy showed promise in improving the users understanding of X-ray imaging principles, facilitating more efficient image acquisition.
comment: 16 pages, 8 figures, 2025 IPCAI conference
MSCEKF-MIO: Magnetic-Inertial Odometry Based on Multi-State Constraint Extended Kalman Filter
To overcome the limitation of existing indoor odometry technologies which often cannot simultaneously meet requirements for accuracy cost-effectiveness, and robustness-this paper proposes a novel magnetometer array-aided inertial odometry approach, MSCEKF-MIO (Multi-State Constraint Extended Kalman Filter-based Magnetic-Inertial Odometry). We construct a magnetic field model by fitting measurements from the magnetometer array and then use temporal variations in this model-extracted from continuous observations-to estimate the carrier's absolute velocity. Furthermore, we implement the MSCEKF framework to fuse observed magnetic field variations with position and attitude estimates from inertial navigation system (INS) integration, thereby enabling autonomous, high-precision indoor relative positioning. Experimental results demonstrate that the proposed algorithm achieves superior velocity estimation accuracy and horizontal positioning precision relative to state-of-the-art magnetic array-aided INS algorithms (MAINS). On datasets with trajectory lengths of 150-250m, the proposed method yields an average horizontal position RMSE of approximately 2.5m. In areas with distinctive magnetic features, the magneto-inertial odometry achieves a velocity estimation accuracy of 0.07m/s. Consequently, the proposed method offers a novel positioning solution characterized by low power consumption, cost-effectiveness, and high reliability in complex indoor environments.
comment: 10 pages
Non-Iterative Coordination of Interconnected Power Grids via Dimension-Decomposition-Based Flexibility Aggregation
The bulk power grid is divided into regional grids interconnected with multiple tie-lines for efficient operation. Since interconnected power grids are operated by different control centers, it is a challenging task to realize coordinated dispatch of multiple regional grids. A viable solution is to compute a flexibility aggregation model for each regional power grid, then optimize the tie-line schedule using the aggregated models to implement non-iterative coordinated dispatch. However, challenges such as intricate interdependencies and curse of dimensionality persist in computing the aggregated models in high-dimensional space. Existing methods like Fourier-Motzkin elimination, vertex search, and multi-parameter programming are limited by dimensionality and conservatism, hindering practical application. This paper presents a novel dimension-decomposition-based flexibility aggregation algorithm for calculating the aggregated models of multiple regional power grids, enabling non-iterative coordination in large-scale interconnected systems. Compared to existing methods, the proposed approach yields a significantly less conservative flexibility region. The derived flexibility aggregation model for each regional power grid has a well-defined physical counterpart, which facilitates intuitive analysis of multi-port regional power grids and provides valuable insights into their internal resource endowments. Numerical tests validate the feasibility of the aggregated model and demonstrate its accuracy in coordinating interconnected power grids.
comment: 11 Pages
Diffusion-Based Failure Sampling for Evaluating Safety-Critical Autonomous Systems
Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample the distribution over failures using a conditional denoising diffusion model, which has shown success in complex high-dimensional problems such as robotic task planning. We iteratively train a diffusion model to produce state trajectories closer to failure. We demonstrate the effectiveness of our approach on high-dimensional robotic validation tasks, improving sample efficiency and mode coverage compared to existing black-box techniques.
comment: Appears in IEEE International Conference on Engineering Reliable Autonomous Systems (ERAS) 2025
Stabilizing Large-Scale Electric Power Grids with Adaptive Inertia
The stability of AC power grids relies on ancillary services that mitigate frequency fluctuations. The electromechanical inertia of large synchronous generators is currently the only resource to absorb frequency disturbances on sub-second time scales. Replacing standard thermal power plants with inertialess new renewable sources of energy (NRE) therefore jeopardizes grid stability against e.g. sudden power generation losses. To guarantee system stability and compensate the lack of electromechanical inertia in grids with large penetrations of NREs, virtual synchronous generators, that emulate conventional generators, have been proposed. Here, we propose a novel control scheme for virtual synchronous generators, where the provided inertia is large at short times -- thereby absorbing faults as efficiently as conventional generators -- but decreases over a tunable time scale to prevent coherent frequency oscillations from setting in. We evaluate the performance of this adaptive inertia scheme under sudden power losses in large-scale transmission grids. We find that it systematically outperforms conventional, electromechanical inertia and that it is more stable than previously suggested schemes. Numerical simulations show how a quasi-optimal geographical distribution of adaptive inertia devices not only absorbs local faults efficiently, but also significantly increases the damping of inter-area oscillations. Our results show that the proposed adaptive inertia control scheme is an excellent solution to strengthen grid stability in future low-inertia power grids with large penetrations of NREs.
comment: 13 pages, 16 figures. Final version with very significant additions and changes
Polynomial and Parallelizable Preconditioning for Block Tridiagonal Positive Definite Matrices
The efficient solution of moderately large-scale linear systems arising from the KKT conditions in optimal control problems (OCPs) is a critical challenge in robotics. With the stagnation of Moore's law, there is growing interest in leveraging GPU-accelerated iterative methods, and corresponding parallel preconditioners, to overcome these computational challenges. To improve the performance of such solvers, we introduce a parallel-friendly, parametrized multi-splitting polynomial preconditioner framework. We first construct and prove the optimal parametrization theoretically in terms of the least amount of distinct eigenvalues and the narrowest spectrum range. We then compare the theoretical time complexity of solving the linear system directly or iteratively. We finally show through numerical experiments how much the preconditioning improves the convergence of OCP linear systems solves.
comment: The initial submission of this work was reviewed in IEEE Control Systems Letters (L-CSS) with an option for presentation at the 2025 Conference on Decision and Control (CDC). This revised version incorporates all feedback from reviewers and editors, addressing theoretical clarifications, structural improvements, and methodological refinements suggested during the review process
ViMo: A Generative Visual GUI World Model for App Agents
App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effective agent planning. However, existing world models primarily focus on generating only textual descriptions, lacking essential visual details. To fill this gap, we propose ViMo, the first visual world model designed to generate future App observations as images. For the challenge of generating text in image patches, where even minor pixel errors can distort readability, we decompose GUI generation into graphic and text content generation. We propose a novel data representation, the Symbolic Text Representation~(STR) to overlay text content with symbolic placeholders while preserving graphics. With this design, ViMo employs a STR Predictor to predict future GUIs' graphics and a GUI-text Predictor for generating the corresponding text. Moreover, we deploy ViMo to enhance agent-focused tasks by predicting the outcome of different action options. Experiments show ViMo's ability to generate visually plausible and functionally effective GUIs that enable App agents to make more informed decisions.
comment: https://ai-agents-2030.github.io/ViMo/
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
We study the sample complexity of online reinforcement learning in the general setting of nonlinear dynamical systems with continuous state and action spaces. Our analysis accommodates a large class of dynamical systems ranging from a finite set of nonlinear candidate models to models with bounded and Lipschitz continuous dynamics, to systems that are parametrized by a compact and real-valued set of parameters. In the most general setting, our algorithm achieves a policy regret of $\mathcal{O}(N \epsilon^2 + \mathrm{ln}(m(\epsilon))/\epsilon^2)$, where $N$ is the time horizon, $\epsilon$ is a user-specified discretization width, and $m(\epsilon)$ measures the complexity of the function class under consideration via its packing number. In the special case where the dynamics are parametrized by a compact and real-valued set of parameters (such as neural networks, transformers, etc.), we prove a policy regret of $\mathcal{O}(\sqrt{N p})$, where $p$ denotes the number of parameters, recovering earlier sample-complexity results that were derived for linear time-invariant dynamical systems. While this article focuses on characterizing sample complexity, the proposed algorithms are likely to be useful in practice, due to their simplicity, their ability to incorporate prior knowledge, and their benign transient behavior.
comment: 29 pages, 3 figures
Local Identifiability of Fully-Connected Feed-Forward Networks with Nonlinear Node Dynamics
We study the identifiability of nonlinear network systems with partial excitation and partial measurement when the network dynamics is linear on the edges and nonlinear on the nodes. We assume that the graph topology and the nonlinear functions at the node level are known, and we aim to identify the weight matrix of the graph. Our main result is to prove that fully-connected layered feed-forward networks are generically locally identifiable by exciting sources and measuring sinks in the class of analytic functions that cross the origin. This holds even when all other nodes remain unexcited and unmeasured and stands in sharp contrast to most findings on network identifiability requiring measurement and/or excitation of each node. The result applies in particular to feed-forward artificial neural networks with no offsets and generalizes previous literature by considering a broader class of functions and topologies.
comment: 9 pages, 2 figures
Synchronous Observer Design for Inertial Navigation Systems with Almost-Global Convergence
An Inertial Navigation System (INS) is a system that integrates acceleration and angular velocity readings from an Inertial Measurement Unit (IMU), along with other sensors such as Global Navigation Satellite Systems (GNSS) position, GNSS velocity, and magnetometer, to estimate the attitude, velocity, and position of a vehicle. This paper shows that the INS problem can be analysed using the automorphism group of the extended special Euclidean group: a group we term the extended similarity group . By exploiting this novel geometric framework, we propose a synchronous observer architecture; that is, an observer architecture for which the observer error is stationary if the correction terms are set to zero. In turn, this enables us to derive a modular, or plug-and-play, observer design for INS that allows different sensors to be added or removed depending on what is available in the vehicle sensor suite. We prove both almost-global asymptotic and local exponential stability of the error dynamics for the common scenario of at least IMU and GNSS position. To the authors' knowledge, this is the first non-linear observer design with almost global convergence guarantees or with plug-and-play modular capability. A simulation with extreme initial error demonstrates the almost-global robustness of the system. Real-world capability is demonstrated on data from a fixed-wing UAV, and the solution is compared to the state-of-the-art ArduPilot INS.
comment: 19 pages, 4 figures, author accepted version
Adaptive Reinforcement Learning for State Avoidance in Discrete Event Systems
Reinforcement learning (RL) has emerged as a potent paradigm for autonomous decision-making in complex environments. However, the integration of event-driven decision processes within RL remains a challenge. This paper presents a novel architecture that combines a Discrete Event Supervisory (DES) model with a standard RL framework to create a hybrid decision-making system. Our model leverages the DES's capabilities in managing event-based dynamics with the RL agent's adaptability to continuous states and actions, facilitating a more robust and flexible control strategy in systems characterized by both continuous and discrete events. The DES model operates alongside the RL agent, enhancing the policy's performance with event-based insights, while the environment's state transitions are governed by a mechanistic model. We demonstrate the efficacy of our approach through simulations that show improved performance metrics over traditional RL implementations. Our results suggest that this integrated approach holds promise for applications ranging from industrial automation to intelligent traffic systems, where discrete event handling is paramount.
comment: This submission was made prematurely and without obtaining the appropriate permissions from all individuals initially listed. I now recognize that the submission did not meet the standards of authorship or originality expected for preprints. I am withdrawing it out of respect for academic integrity and to ensure that all future work is submitted in accordance with proper ethical guidelines
Integrating Koopman theory and Lyapunov stability for enhanced model predictive control in nonlinear systems
This paper delves into the challenges posed by the increasing complexity of modern control systems, specifically focusing on bilinear systems, a prevalent subclass of non-linear systems characterized by state dynamics influenced by the interaction of state and control variables. Traditional control strategies, such as PID controllers, often fall short in adequately addressing the intricacies of such systems due to their predictive limitations. To bridge this gap, we introduce Model Predictive Control (MPC), a sophisticated technique that utilizes system models to forecast future behaviors, allowing for the computation of an optimal control sequence by minimizing deviations and control efforts. The Koopman operator emerges as a pivotal tool in this framework by providing a means to linearize the nonlinear dynamics of bilinear systems. By integrating the principles of Lyapunov theory with the linearizing capabilities of the Koopman operator into the MPC framework, we give rise to Koopman Lyapunov-based Model Predictive Control (Koopman LMPC). This approach not only retains MPC's predictive capabilities but also harnesses the Koopman operator's ability to transform complex nonlinear behaviors into a linear framework, thereby enhancing the robustness and applicability of LMPC. With the stability assurances from Lyapunov theory, Koopman LMPC presents a robust solution to effectively control and stabilize bilinear systems. The paper underscores the efficacy of Koopman LMPC, emphasizing its significance in achieving optimal performance and system stability, marking it as a promising approach for the future of advanced control systems.
comment: This submission was made prematurely and without obtaining the appropriate permissions from all individuals initially listed. I now recognize that the submission did not meet the standards of authorship or originality expected for preprints. I am withdrawing it out of respect for academic integrity and to ensure that all future work is submitted in accordance with proper ethical guidelines
Non-Blocking Robustness Analysis in Discrete Event Systems
This paper presents a mathematical framework for characterizing state blocking in discrete event systems (DES) under transition deletions. We introduce a path-based analysis approach that determines whether systems maintain non-blocking properties when transitions are removed. Through formal analysis and case studies, we establish three key contributions: a mathematical characterization of transition-induced blocking with necessary and sufficient conditions, a definition of robust deviations that preserve non-blocking properties, and an algorithm for identifying critical transitions and analyzing system behavior under deletions. Our algorithm reduces computational complexity by leveraging minimal blocking sets, achieving significant reduction in computational requirements. We demonstrate the framework's effectiveness through manufacturing system and autonomous vehicle case studies, showing substantial improvements in identifying critical transitions and predicting potential blocking scenarios across different application domains.
comment: This submission was made prematurely and without obtaining the appropriate permissions from all individuals initially listed. I now recognize that the submission did not meet the standards of authorship or originality expected for preprints. I am withdrawing it out of respect for academic integrity and to ensure that all future work is submitted in accordance with proper ethical guidelines
A Review of Stop-and-Go Traffic Wave Suppression Strategies: Variable Speed Limit vs. Jam-Absorption Driving
The main form of freeway traffic congestion is the familiar stop-and-go wave, characterized by wide moving jams that propagate indefinitely upstream provided enough traffic demand. They cause severe, long-lasting adverse effects, such as reduced traffic efficiency, increased driving risks, and higher vehicle emissions. This underscores the crucial importance of artificial intervention in the propagation of stop-and-go waves. Over the past two decades, two prominent strategies for stop-and-go wave suppression have emerged: variable speed limit (VSL) and jam-absorption driving (JAD). Although they share similar research motivations, objectives, and theoretical foundations, the development of these strategies has remained relatively disconnected. To synthesize fragmented advances and drive the field forward, this paper first provides a comprehensive review of the achievements in the stop-and-go wave suppression-oriented VSL and JAD, respectively. It then focuses on bridging the two areas and identifying research opportunities from the following perspectives: fundamental diagrams, secondary waves, generalizability, traffic state estimation and prediction, robustness to randomness, scenarios for strategy validation, and field tests and practical deployment. We expect that through this review, one area can effectively address its limitations by identifying and leveraging the strengths of the other, thus promoting the overall research goal of freeway stop-and-go wave suppression.
Quality-Aware Hydraulic Control in Drinking Water Networks via Controllability Proxies
The operation of water distribution networks simply aims at efficiently delivering consumers adequate water while maintaining safe water quality (WQ). However, this process entails a multi-scale interplay between hydraulic and WQ dynamics evolving spatio-temporally within such a complex infrastructure network. While prior research has addressed the hydraulic optimization problem and WQ regulation as decoupled or coupled, they often overlook control-theoretic guided solutions. This paper takes a novel approach by investigating the coupling between hydraulic and WQ dynamics from a control networks perspective. We propose a quality-aware control framework that embeds WQ controllability metrics into the network-level pump scheduling problem, acknowledging the direct influence of system hydraulics on WQ controller behavior. We examine the trade-offs between pump control energy cost and WQ performance across various network sizes and scenarios. Our results showcase how network topology, hydraulic constraints, and WQ metrics jointly impact optimal pump schedules and, accordingly, the achievable level of WQ regulation, offering insights into designing efficient control strategies for water infrastructure networks governed by interdependent dynamics.
The Role of Integrity Monitoring in Connected and Automated Vehicles: Current State-of-Practice and Future Directions
Positioning integrity refers to the trust in the performance of a navigation system. Accurate and reliable position information is needed to meet the requirements of connected and Automated Vehicle (CAV) applications, particularly in safety-critical scenarios. Receiver Autonomous Integrity Monitoring (RAIM) and its variants have been widely studied for Global Navigation Satellite System (GNSS)-based vehicle positioning, often fused with kinematic (e.g., Odometry) and perception sensors (e.g., camera). However, integrity monitoring (IM) for cooperative positioning solutions leveraging Vehicle-to-Everything (V2X) communication has received comparatively limited attention. This paper reviews existing research in the field of positioning IM and identifies various research gaps. Particular attention has been placed on identifying research that highlights cooperative IM methods. It also examines key automotive safety standards and public V2X datasets to map current research priorities and uncover critical gaps. Finally, the paper outlines promising future directions, highlighting research topics aimed at advancing and benchmarking positioning integrity.
comment: This work has been submitted to the IEEE for possible publication
On Signed Network Coordination Games
We study binary-action pairwise-separable network games that encompass both coordinating and anti-coordinating behaviors. Our model is grounded in an underlying directed signed graph, where each link is associated with a weight that describes the strenght and nature of the interaction. The utility for each agent is an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent cohesive subset of players, who are either connected exclusively by positive weights, or forms a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Given the properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by a consensus or, respectively, a polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the supermodular properties of coordination games and on a novel use of the concept of graph cohesiveness.
comment: 14 pages, 8 figures
Underapproximating Safe Domains of Attraction for Discrete-Time Systems Using Implicit Representations of Backward Reachable Sets
Analyzing and certifying stability and attractivity of nonlinear systems is a topic of research interest that has been extensively investigated by control theorists and engineers for many years. Despite that, accurately estimating domains of attraction for nonlinear systems remains a challenging task, where available estimation approaches are either conservative or limited to low-dimensional systems. In this work, we propose an iterative approach to accurately underapproximate safe (i.e., state-constrained) domains of attraction for general discrete-time autonomous nonlinear systems. Our approach relies on implicit representations of safe backward reachable sets of safe regions of attraction, where such regions can be be easily constructed using, e.g., quadratic Lyapunov functions. The iterations of our approach are monotonic (in the sense of set inclusion), where each iteration results in a safe region of attraction, given as a sublevel set, that underapproximates the safe domain of attraction. The sublevel set representations of the resulting regions of attraction can be efficiently utilized in verifying the inclusion of given points of interest in the safe domain of attraction. We illustrate our approach through two numerical examples, involving two- and four-dimensional nonlinear systems.
comment: This updated manuscript corrects errors in the formulas for the bounds used in computing ellipsoidal regions of attraction
Systems and Control (EESS)
Sequential QCQP for Bilevel Optimization with Line Search
Bilevel optimization involves a hierarchical structure where one problem is nested within another, leading to complex interdependencies between levels. We propose a single-loop, tuning-free algorithm that guarantees anytime feasibility, i.e., approximate satisfaction of the lower-level optimality condition, while ensuring descent of the upper-level objective. At each iteration, a convex quadratically-constrained quadratic program (QCQP) with a closed-form solution yields the search direction, followed by a backtracking line search inspired by control barrier functions to ensure safe, uniformly positive step sizes. The resulting method is scalable, requires no hyperparameter tuning, and converges under mild local regularity assumptions. We establish an O(1/k) ergodic convergence rate and demonstrate the algorithm's effectiveness on representative bilevel tasks.
comment: Under Review
Development of a Scaled Setup for Experimental Study of the Effect of Lateral Dynamics on Energy Consumption in Electric Vehicles: An Extension
Most of the existing state-of-the-art approaches for energy consumption analysis do not account for the effect of lateral dynamics on energy consumption in electric vehicles (EVs) during vehicle maneuvers. This paper aims to validate this effect through an experimental study. We develop a scaled model using a radio-controlled (RC) car, modified to achieve dynamic similitude with on-road vehicles, to conduct scaled experiments. The experimental results confirm the impact of lateral dynamics on both energy demand and driving range in electric vehicles, aligning with our previous findings [1], and emphasize the need to incorporate these factors into energy consumption models.
Comparison of Data-Driven Modeling Approaches for Control Optimization of Floating Offshore Wind Turbines
Models that balance accuracy against computational costs are advantageous when designing wind turbines with optimization studies, as several hundred predictive function evaluations might be necessary to identify the optimal solution. We explore different approaches to construct low-fidelity models that can be used to approximate dynamic quantities and be used as surrogates for design optimization studies and other use cases. In particular, low-fidelity modeling approaches using classical systems identification and deep learning approaches are considered against derivative function surrogate models ({DFSMs}), or approximate models of the state derivative function. This work proposes a novel method that utilizes a linear parameter varying (LPV) modeling scheme to construct the DFSM. We compare the trade-offs between these different models and explore the efficacy of the proposed DFSM approach in approximating wind turbine performance and design optimization studies for controllers. Results show that the proposed DFSM approach balances computational time and model accuracy better than the system identification and deep learning based models. Additionally, the DFSM provides nearly a fifty times speed-up compared to the high-fidelity model, while balancing accuracy.
comment: 16 pages, 15 figures
Security of Distributed Gradient Descent Against Byzantine Agents
This paper investigates the security of Decentralized Gradient Descent (DGD) algorithms on undirected graphs. Specifically, we consider Byzantine agents that inject stealthy attacks to degrade the performance of the DGD algorithm in terms of its optimal value. To mitigate the effect of stealthy attacks, some agents are allocated to monitor the evolution of their gradient. We then propose a method to quantify the maximum deviation caused by the Byzantine agent in the optimal value of the DGD. Our approach serves as a security metric to evaluate the robustness of graph structures against Byzantine attacks. We validate our findings through numerical simulations.
Efficient Configuration-Constrained Tube MPC via Variables Restriction and Template Selection
Configuration-Constrained Tube Model Predictive Control (CCTMPC) offers flexibility by using a polytopic parameterization of invariant sets and the optimization of an associated vertex control law. This flexibility, however, often demands computational trade-offs between set parameterization accuracy and optimization complexity. This paper proposes two innovations that help the user tackle this trade-off. First, a structured framework is proposed, which strategically limits optimization degrees of freedom, significantly reducing online computation time while retaining stability guarantees. This framework aligns with Homothetic Tube MPC (HTMPC) under maximal constraints. Second, a template refinement algorithm that iteratively solves quadratic programs is introduced to balance polytope complexity and conservatism. Simulation studies on an illustrative benchmark problem as well as a high-dimensional ten-state system demonstrate the approach's efficiency, achieving robust performance with minimal computational overhead. The results validate a practical pathway to leveraging CCTMPC's adaptability without sacrificing real-time viability.
comment: 10 pages, submitted to CDC 2025 conference
A system identification approach to clustering vector autoregressive time series
Clustering of time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling. Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction, where the autocorrelation pattern/feature is mostly ignored. Instead of relying on heuristic feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics. We first derive a clustering algorithm based on a mixture autoregressive model. Unfortunately it turns out to have significant computational problems. We then derive a `small-noise' limiting version of the algorithm, which we call k-LMVAR (Limiting Mixture Vector AutoRegression), that is computationally manageable. We develop an associated BIC criterion for choosing the number of clusters and model order. The algorithm performs very well in comparative simulations and also scales well computationally.
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations
This paper introduces a data-driven (i.e., model-free) approach to identify which inverter-based resources (IBRs) have dominant participation in poorly damped sub-synchronous oscillations (SSO), to get to the root cause for effective mitigation. An Enhanced Dynamic Mode Decomposition (eDMD) method is proposed that incorporates an appropriate set of observables. Based on time-synchronized data (either simulated or real) from IBR connection points, eDMD directly computes data-driven eigenvectors and participation factors to reveal the role of each IBR in poorly damped SSO. We show the improved accuracy of eDMD over conventional Dynamic Mode Decomposition (DMD) by benchmarking both against actual model-based analysis. We demonstrate this first through a synthetic example and then a case study on the IEEE 39-bus test system with 100% IBR. This data-driven, model-free method offers a powerful tool to foresee and mitigate the risk of IBR-induced SSO in planning (simulated data) and post-event analysis (real data) of SSO events.
comment: 10 pages, 14 figures, Journal paper.Submitted to IEEE Transactions on Power System
Functional Controllability, Functional Stabilisability, and the Generalised Separation Principle
This paper introduces the new concepts of Functional Controllability and Functional Stabilisability, and establishes their duality with Functional Observability and Functional Detectability, respectively. We further present a Generalised Separation Principle, demonstrating that the classical Separation Principle emerges as a special case. Conditions for the existence of functional controllers of a specified order are derived. Importantly, the design framework does not require full controllability. Furthermore, we develop a functional observer-based controller design applicable to systems that are both uncontrollable and unobservable. The results presented generalise the classical full-state feedback control paradigm.
comment: Under review in a journal
Statistically Optimal Structured Additive MIMO Continuous-time System Identification
Many applications in mechanical, acoustic, and electronic engineering require estimating complex dynamical models, often represented as additive multi-input multi-output (MIMO) transfer functions with structural constraints. This paper introduces a two-stage procedure for estimating structured additive MIMO models, where structural constraints are enforced through a weighted nonlinear least-squares projection of the parameter vector initially estimated using a recently developed refined instrumental variables algorithm. The proposed approach is shown to be consistent and asymptotically efficient in open-loop scenarios. In closed-loop settings, it remains consistent despite potential noise model misspecification and achieves minimum covariance among all instrumental variable estimators. Extensive simulations are performed to validate the theoretical findings, and to show the efficacy of the proposed approach.
comment: 15 pages, 5 figures
Gaming Strategies in European Imbalance Settlement Mechanisms
Transmission System Operators (TSOs) rely on balancing energy provided by Balancing Service Providers (BSPs) to maintain the supply-demand balance in real time. Balance Responsible Parties (BRPs) can simultaneously deviate from their day-ahead schedules in response to imbalance prices, e.g., by controlling flexible assets such as batteries. According to the European Electricity Balancing Guideline, these imbalance prices should incentivize BRPs performing such implicit or passive balancing to aid the TSO in restoring the energy balance. In this paper, we demonstrate that BRPs are unintentionally offered the opportunity to exploit gaming strategies in European imbalance settlement mechanisms. This is enabled by a disconnect between sub-quarter-hourly dynamics that determine the imbalance prices and the financial settlement on a quarter-hourly basis. We illustrate this behavior in a case study of the imbalance settlement mechanisms in Belgium and the Netherlands. Our results reveal that, in both countries, BRPs can, in theory, exploit the imbalance mechanism by increasing the instantaneous system imbalance during minutes within the quarter-hour that determine the imbalance price while still contributing to restoring the system balance for the rest of the quarter-hour.
VeRecycle: Reclaiming Guarantees from Probabilistic Certificates for Stochastic Dynamical Systems after Change IJCAI 2025
Autonomous systems operating in the real world encounter a range of uncertainties. Probabilistic neural Lyapunov certification is a powerful approach to proving safety of nonlinear stochastic dynamical systems. When faced with changes beyond the modeled uncertainties, e.g., unidentified obstacles, probabilistic certificates must be transferred to the new system dynamics. However, even when the changes are localized in a known part of the state space, state-of-the-art requires complete re-certification, which is particularly costly for neural certificates. We introduce VeRecycle, the first framework to formally reclaim guarantees for discrete-time stochastic dynamical systems. VeRecycle efficiently reuses probabilistic certificates when the system dynamics deviate only in a given subset of states. We present a general theoretical justification and algorithmic implementation. Our experimental evaluation shows scenarios where VeRecycle both saves significant computational effort and achieves competitive probabilistic guarantees in compositional neural control.
comment: accepted to IJCAI 2025
Symbolic and Numerical Tools for $L_{\infty}$-Norm Calculation
The computation of the $L_\infty $-norm is an important issue in $H_{\infty}$ control, particularly for analyzing system stability and robustness. This paper focuses on symbolic computation methods for determining the $L_{\infty} $-norm of finite-dimensional linear systems, highlighting their advantages in achieving exact solutions where numerical methods often encounter limitations. Key techniques such as Sturm-Habicht sequences, Rational Univariate Representations (RUR), and Cylindrical Algebraic Decomposition (CAD) are surveyed, with an emphasis on their theoretical foundations, practical implementations, and specific applicability to $ L_{\infty} $-norm computation. A comparative analysis is conducted between symbolic and conventional numerical approaches, underscoring scenarios in which symbolic computation provides superior accuracy, particularly in parametric cases. Benchmark evaluations reveal the strengths and limitations of both approaches, offering insights into the trade-offs involved. Finally, the discussion addresses the challenges of symbolic computation and explores future opportunities for its integration into control theory, particularly for robust and stable system analysis.
On the Input-Output Monotonicity of Voltage Dynamics of Power System with Grid-Forming Converters
Integration of renewable resources is profoundly reshaping the dynamics of modern power systems. This study shows that the voltage dynamics of power systems with multiple grid-forming (GFM) converters often enjoys a desirable property called input-output monotonicity. A systematic approach for computing the derivatives of the voltage subsystem is presented first, which provides insight into the structural characteristics of these models. Next, the sign pattern of the trajectory Jacobian matrix associated with the voltage subsystem is analyzed and revealed. The analysis indicates that the voltage dynamics of power systems often exhibits the so-called input-output monotonicity property. The theoretical results are then validated through several simulation examples, underscoring their practical implications.
C*: A Coverage Path Planning Algorithm for Unknown Environments using Rapidly Covering Graphs
The paper presents a novel sample-based algorithm, called C*, for real-time coverage path planning (CPP) of unknown environments. The C* algorithm is built upon the concept of Rapidly Covering Graph (RCGs). The RCG is constructed incrementally via progressive sampling during robot navigation, which eliminates the need for cellular decomposition of the search space. The RCG has a sparse-graph structure formed by efficient sampling and pruning techniques, which produces non-myopic waypoints of the coverage trajectory. While C* produces the desired back and forth coverage pattern, it adapts to the TSP-based locally optimal coverage of small uncovered regions, called coverage holes, that are surrounded by obstacles and covered regions. Thus, C* proactively detects and covers the coverage holes in situ, which reduces the coverage time by preventing the longer return trajectories from distant regions to cover such holes later. The algorithmic simplicity and low computational complexity of C* makes it easy to implement and suitable for real-time onboard applications. It is analytically proven that C* provides complete coverage of unknown environments. The performance of C* is validated by 1) extensive high-fidelity simulations and 2) real laboratory experiments using autonomous robots. A comparative evaluation with seven existing CPP methods demonstrate that C* yields significant performance improvements in terms of coverage time, number of turns, trajectory length and overlap ratio, while preventing the formation of coverage holes. Finally, C* is evaluated on two different applications of CPP using 1) energy-constrained robots and 2) multi-robot teams.
Customized Interior-Point Methods Solver for Embedded Real-Time Convex Optimization
This paper presents a customized convex optimization solver tailored for embedded real-time optimization, which frequently arise in modern guidance and control (G&C) applications. The solver employs a practically efficient predictor-corrector type primal-dual interior-point method (PDIPM) combined with a homogeneous embedding framework for infeasibility detection. Unlike conventional homogeneous self-dual embedding formulations, the adopted approach can directly handle quadratic cost functions without requiring problem reformulation. To support a systematic workflow, we also develop a code generation tool that analyzes the sparsity pattern of the provided problem family and generates customized solver code using a predefined code template. The generated solver code is written in C with no external dependencies other than the standard library math.h, and it supports complete static allocation of all data. Additionally, it provides parsing information to facilitate the use of the solver API by end users. Benchmark results and numerical experiments on an embedded platform demonstrate that the developed solver outperforms existing solvers in both efficiency and reliability.
SecCAN: An Extended CAN Controller with Embedded Intrusion Detection
Recent research has highlighted the vulnerability of in-vehicle network protocols such as controller area networks (CAN) and proposed machine learning-based intrusion detection systems (IDSs) as an effective mitigation technique. However, their efficient integration into vehicular architecture is non-trivial, with existing methods relying on electronic control units (ECUs)-coupled IDS accelerators or dedicated ECUs as IDS accelerators. Here, initiating IDS requires complete reception of a CAN message from the controller, incurring data movement and software overheads. In this paper, we present SecCAN, a novel CAN controller architecture that embeds IDS capability within the datapath of the controller. This integration allows IDS to tap messages directly from within the CAN controller as they are received from the bus, removing overheads incurred by existing ML-based IDSs. A custom-quantised machine-learning accelerator is developed as the IDS engine and embedded into SecCAN's receive data path, with optimisations to overlap the IDS inference with the protocol's reception window. We implement SecCAN on AMD XCZU7EV FPGA to quantify its performance and benefits in hardware, using multiple attack datasets. We show that SecCAN can completely hide the IDS latency within the CAN reception window for all CAN packet sizes and detect multiple attacks with state-of-the-art accuracy with zero software overheads on the ECU and low energy overhead (73.7 uJ per message) for IDS inference. Also, SecCAN incurs limited resource overhead compared to a standard CAN controller (< 30% LUT, < 1% FF), making it ideally suited for automotive deployment.
comment: 4 pages, 3 figures, 3 tables, Accepted in IEEE Embedded Systems Letters (https://ieee-ceda.org/publication/esl)
Understanding 6G through Language Models: A Case Study on LLM-aided Structured Entity Extraction in Telecom Domain
Knowledge understanding is a foundational part of envisioned 6G networks to advance network intelligence and AI-native network architectures. In this paradigm, information extraction plays a pivotal role in transforming fragmented telecom knowledge into well-structured formats, empowering diverse AI models to better understand network terminologies. This work proposes a novel language model-based information extraction technique, aiming to extract structured entities from the telecom context. The proposed telecom structured entity extraction (TeleSEE) technique applies a token-efficient representation method to predict entity types and attribute keys, aiming to save the number of output tokens and improve prediction accuracy. Meanwhile, TeleSEE involves a hierarchical parallel decoding method, improving the standard encoder-decoder architecture by integrating additional prompting and decoding strategies into entity extraction tasks. In addition, to better evaluate the performance of the proposed technique in the telecom domain, we further designed a dataset named 6GTech, including 2390 sentences and 23747 words from more than 100 6G-related technical publications. Finally, the experiment shows that the proposed TeleSEE method achieves higher accuracy than other baseline techniques, and also presents 5 to 9 times higher sample processing speed.
Learning POMDPs with Linear Function Approximation and Finite Memory
We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback policies. We provide error bounds derived from filter stability and projection errors. We then study the learning of finite-memory based near-optimal Q values. Convergence in this case requires further assumptions on the exploration policy when using general basis functions. We then show that these assumptions can be relaxed for specific models such as those with perfectly linear cost and dynamics, or when using discretization based basis functions.
Virtual Fluoroscopy for Interventional Guidance using Magnetic Tracking
Purpose: In conventional fluoroscopy-guided interventions, the 2D projective nature of X-ray imaging limits depth perception and leads to prolonged radiation exposure. Virtual fluoroscopy, combined with spatially tracked surgical instruments, is a promising strategy to mitigate these limitations. While magnetic tracking shows unique advantages, particularly in tracking flexible instruments, it remains under-explored due to interference from ferromagnetic materials in the C-arm room. This work proposes a virtual fluoroscopy workflow by effectively integrating magnetic tracking, and demonstrates its clinical efficacy. Methods: An automatic virtual fluoroscopy workflow was developed using a radiolucent tabletop field generator prototype. Specifically, we developed a fluoro-CT registration approach with automatic 2D-3D shared landmark correspondence to establish the C-arm-patient relationship, along with a general C-arm modelling approach to calculate desired poses and generate corresponding virtual fluoroscopic images. Results: Testing on a dataset with views ranging from RAO 90 degrees to LAO 90 degrees, simulated fluoroscopic images showed visually imperceptible differences from the real ones, achieving a mean target projection distance error of 1.55 mm. An endoleak phantom insertion experiment highlighted the effectiveness of simulating multiplanar views with real-time instrument overlays, achieving a mean needle tip error of 3.42 mm. Conclusions: Results demonstrated the efficacy of virtual fluoroscopy integrated with magnetic tracking, improving depth perception during navigation. The broad capture range of virtual fluoroscopy showed promise in improving the users understanding of X-ray imaging principles, facilitating more efficient image acquisition.
comment: 16 pages, 8 figures, 2025 IPCAI conference
MSCEKF-MIO: Magnetic-Inertial Odometry Based on Multi-State Constraint Extended Kalman Filter
To overcome the limitation of existing indoor odometry technologies which often cannot simultaneously meet requirements for accuracy cost-effectiveness, and robustness-this paper proposes a novel magnetometer array-aided inertial odometry approach, MSCEKF-MIO (Multi-State Constraint Extended Kalman Filter-based Magnetic-Inertial Odometry). We construct a magnetic field model by fitting measurements from the magnetometer array and then use temporal variations in this model-extracted from continuous observations-to estimate the carrier's absolute velocity. Furthermore, we implement the MSCEKF framework to fuse observed magnetic field variations with position and attitude estimates from inertial navigation system (INS) integration, thereby enabling autonomous, high-precision indoor relative positioning. Experimental results demonstrate that the proposed algorithm achieves superior velocity estimation accuracy and horizontal positioning precision relative to state-of-the-art magnetic array-aided INS algorithms (MAINS). On datasets with trajectory lengths of 150-250m, the proposed method yields an average horizontal position RMSE of approximately 2.5m. In areas with distinctive magnetic features, the magneto-inertial odometry achieves a velocity estimation accuracy of 0.07m/s. Consequently, the proposed method offers a novel positioning solution characterized by low power consumption, cost-effectiveness, and high reliability in complex indoor environments.
comment: 10 pages
Non-Iterative Coordination of Interconnected Power Grids via Dimension-Decomposition-Based Flexibility Aggregation
The bulk power grid is divided into regional grids interconnected with multiple tie-lines for efficient operation. Since interconnected power grids are operated by different control centers, it is a challenging task to realize coordinated dispatch of multiple regional grids. A viable solution is to compute a flexibility aggregation model for each regional power grid, then optimize the tie-line schedule using the aggregated models to implement non-iterative coordinated dispatch. However, challenges such as intricate interdependencies and curse of dimensionality persist in computing the aggregated models in high-dimensional space. Existing methods like Fourier-Motzkin elimination, vertex search, and multi-parameter programming are limited by dimensionality and conservatism, hindering practical application. This paper presents a novel dimension-decomposition-based flexibility aggregation algorithm for calculating the aggregated models of multiple regional power grids, enabling non-iterative coordination in large-scale interconnected systems. Compared to existing methods, the proposed approach yields a significantly less conservative flexibility region. The derived flexibility aggregation model for each regional power grid has a well-defined physical counterpart, which facilitates intuitive analysis of multi-port regional power grids and provides valuable insights into their internal resource endowments. Numerical tests validate the feasibility of the aggregated model and demonstrate its accuracy in coordinating interconnected power grids.
comment: 11 Pages
Diffusion-Based Failure Sampling for Evaluating Safety-Critical Autonomous Systems
Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample the distribution over failures using a conditional denoising diffusion model, which has shown success in complex high-dimensional problems such as robotic task planning. We iteratively train a diffusion model to produce state trajectories closer to failure. We demonstrate the effectiveness of our approach on high-dimensional robotic validation tasks, improving sample efficiency and mode coverage compared to existing black-box techniques.
comment: Appears in IEEE International Conference on Engineering Reliable Autonomous Systems (ERAS) 2025
Stabilizing Large-Scale Electric Power Grids with Adaptive Inertia
The stability of AC power grids relies on ancillary services that mitigate frequency fluctuations. The electromechanical inertia of large synchronous generators is currently the only resource to absorb frequency disturbances on sub-second time scales. Replacing standard thermal power plants with inertialess new renewable sources of energy (NRE) therefore jeopardizes grid stability against e.g. sudden power generation losses. To guarantee system stability and compensate the lack of electromechanical inertia in grids with large penetrations of NREs, virtual synchronous generators, that emulate conventional generators, have been proposed. Here, we propose a novel control scheme for virtual synchronous generators, where the provided inertia is large at short times -- thereby absorbing faults as efficiently as conventional generators -- but decreases over a tunable time scale to prevent coherent frequency oscillations from setting in. We evaluate the performance of this adaptive inertia scheme under sudden power losses in large-scale transmission grids. We find that it systematically outperforms conventional, electromechanical inertia and that it is more stable than previously suggested schemes. Numerical simulations show how a quasi-optimal geographical distribution of adaptive inertia devices not only absorbs local faults efficiently, but also significantly increases the damping of inter-area oscillations. Our results show that the proposed adaptive inertia control scheme is an excellent solution to strengthen grid stability in future low-inertia power grids with large penetrations of NREs.
comment: 13 pages, 16 figures. Final version with very significant additions and changes
Polynomial and Parallelizable Preconditioning for Block Tridiagonal Positive Definite Matrices
The efficient solution of moderately large-scale linear systems arising from the KKT conditions in optimal control problems (OCPs) is a critical challenge in robotics. With the stagnation of Moore's law, there is growing interest in leveraging GPU-accelerated iterative methods, and corresponding parallel preconditioners, to overcome these computational challenges. To improve the performance of such solvers, we introduce a parallel-friendly, parametrized multi-splitting polynomial preconditioner framework. We first construct and prove the optimal parametrization theoretically in terms of the least amount of distinct eigenvalues and the narrowest spectrum range. We then compare the theoretical time complexity of solving the linear system directly or iteratively. We finally show through numerical experiments how much the preconditioning improves the convergence of OCP linear systems solves.
comment: The initial submission of this work was reviewed in IEEE Control Systems Letters (L-CSS) with an option for presentation at the 2025 Conference on Decision and Control (CDC). This revised version incorporates all feedback from reviewers and editors, addressing theoretical clarifications, structural improvements, and methodological refinements suggested during the review process
ViMo: A Generative Visual GUI World Model for App Agents
App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effective agent planning. However, existing world models primarily focus on generating only textual descriptions, lacking essential visual details. To fill this gap, we propose ViMo, the first visual world model designed to generate future App observations as images. For the challenge of generating text in image patches, where even minor pixel errors can distort readability, we decompose GUI generation into graphic and text content generation. We propose a novel data representation, the Symbolic Text Representation~(STR) to overlay text content with symbolic placeholders while preserving graphics. With this design, ViMo employs a STR Predictor to predict future GUIs' graphics and a GUI-text Predictor for generating the corresponding text. Moreover, we deploy ViMo to enhance agent-focused tasks by predicting the outcome of different action options. Experiments show ViMo's ability to generate visually plausible and functionally effective GUIs that enable App agents to make more informed decisions.
comment: https://ai-agents-2030.github.io/ViMo/
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
We study the sample complexity of online reinforcement learning in the general setting of nonlinear dynamical systems with continuous state and action spaces. Our analysis accommodates a large class of dynamical systems ranging from a finite set of nonlinear candidate models to models with bounded and Lipschitz continuous dynamics, to systems that are parametrized by a compact and real-valued set of parameters. In the most general setting, our algorithm achieves a policy regret of $\mathcal{O}(N \epsilon^2 + \mathrm{ln}(m(\epsilon))/\epsilon^2)$, where $N$ is the time horizon, $\epsilon$ is a user-specified discretization width, and $m(\epsilon)$ measures the complexity of the function class under consideration via its packing number. In the special case where the dynamics are parametrized by a compact and real-valued set of parameters (such as neural networks, transformers, etc.), we prove a policy regret of $\mathcal{O}(\sqrt{N p})$, where $p$ denotes the number of parameters, recovering earlier sample-complexity results that were derived for linear time-invariant dynamical systems. While this article focuses on characterizing sample complexity, the proposed algorithms are likely to be useful in practice, due to their simplicity, their ability to incorporate prior knowledge, and their benign transient behavior.
comment: 29 pages, 3 figures
Local Identifiability of Fully-Connected Feed-Forward Networks with Nonlinear Node Dynamics
We study the identifiability of nonlinear network systems with partial excitation and partial measurement when the network dynamics is linear on the edges and nonlinear on the nodes. We assume that the graph topology and the nonlinear functions at the node level are known, and we aim to identify the weight matrix of the graph. Our main result is to prove that fully-connected layered feed-forward networks are generically locally identifiable by exciting sources and measuring sinks in the class of analytic functions that cross the origin. This holds even when all other nodes remain unexcited and unmeasured and stands in sharp contrast to most findings on network identifiability requiring measurement and/or excitation of each node. The result applies in particular to feed-forward artificial neural networks with no offsets and generalizes previous literature by considering a broader class of functions and topologies.
comment: 9 pages, 2 figures
Synchronous Observer Design for Inertial Navigation Systems with Almost-Global Convergence
An Inertial Navigation System (INS) is a system that integrates acceleration and angular velocity readings from an Inertial Measurement Unit (IMU), along with other sensors such as Global Navigation Satellite Systems (GNSS) position, GNSS velocity, and magnetometer, to estimate the attitude, velocity, and position of a vehicle. This paper shows that the INS problem can be analysed using the automorphism group of the extended special Euclidean group: a group we term the extended similarity group . By exploiting this novel geometric framework, we propose a synchronous observer architecture; that is, an observer architecture for which the observer error is stationary if the correction terms are set to zero. In turn, this enables us to derive a modular, or plug-and-play, observer design for INS that allows different sensors to be added or removed depending on what is available in the vehicle sensor suite. We prove both almost-global asymptotic and local exponential stability of the error dynamics for the common scenario of at least IMU and GNSS position. To the authors' knowledge, this is the first non-linear observer design with almost global convergence guarantees or with plug-and-play modular capability. A simulation with extreme initial error demonstrates the almost-global robustness of the system. Real-world capability is demonstrated on data from a fixed-wing UAV, and the solution is compared to the state-of-the-art ArduPilot INS.
comment: 19 pages, 4 figures, author accepted version
Adaptive Reinforcement Learning for State Avoidance in Discrete Event Systems
Reinforcement learning (RL) has emerged as a potent paradigm for autonomous decision-making in complex environments. However, the integration of event-driven decision processes within RL remains a challenge. This paper presents a novel architecture that combines a Discrete Event Supervisory (DES) model with a standard RL framework to create a hybrid decision-making system. Our model leverages the DES's capabilities in managing event-based dynamics with the RL agent's adaptability to continuous states and actions, facilitating a more robust and flexible control strategy in systems characterized by both continuous and discrete events. The DES model operates alongside the RL agent, enhancing the policy's performance with event-based insights, while the environment's state transitions are governed by a mechanistic model. We demonstrate the efficacy of our approach through simulations that show improved performance metrics over traditional RL implementations. Our results suggest that this integrated approach holds promise for applications ranging from industrial automation to intelligent traffic systems, where discrete event handling is paramount.
comment: This submission was made prematurely and without obtaining the appropriate permissions from all individuals initially listed. I now recognize that the submission did not meet the standards of authorship or originality expected for preprints. I am withdrawing it out of respect for academic integrity and to ensure that all future work is submitted in accordance with proper ethical guidelines
Integrating Koopman theory and Lyapunov stability for enhanced model predictive control in nonlinear systems
This paper delves into the challenges posed by the increasing complexity of modern control systems, specifically focusing on bilinear systems, a prevalent subclass of non-linear systems characterized by state dynamics influenced by the interaction of state and control variables. Traditional control strategies, such as PID controllers, often fall short in adequately addressing the intricacies of such systems due to their predictive limitations. To bridge this gap, we introduce Model Predictive Control (MPC), a sophisticated technique that utilizes system models to forecast future behaviors, allowing for the computation of an optimal control sequence by minimizing deviations and control efforts. The Koopman operator emerges as a pivotal tool in this framework by providing a means to linearize the nonlinear dynamics of bilinear systems. By integrating the principles of Lyapunov theory with the linearizing capabilities of the Koopman operator into the MPC framework, we give rise to Koopman Lyapunov-based Model Predictive Control (Koopman LMPC). This approach not only retains MPC's predictive capabilities but also harnesses the Koopman operator's ability to transform complex nonlinear behaviors into a linear framework, thereby enhancing the robustness and applicability of LMPC. With the stability assurances from Lyapunov theory, Koopman LMPC presents a robust solution to effectively control and stabilize bilinear systems. The paper underscores the efficacy of Koopman LMPC, emphasizing its significance in achieving optimal performance and system stability, marking it as a promising approach for the future of advanced control systems.
comment: This submission was made prematurely and without obtaining the appropriate permissions from all individuals initially listed. I now recognize that the submission did not meet the standards of authorship or originality expected for preprints. I am withdrawing it out of respect for academic integrity and to ensure that all future work is submitted in accordance with proper ethical guidelines
Non-Blocking Robustness Analysis in Discrete Event Systems
This paper presents a mathematical framework for characterizing state blocking in discrete event systems (DES) under transition deletions. We introduce a path-based analysis approach that determines whether systems maintain non-blocking properties when transitions are removed. Through formal analysis and case studies, we establish three key contributions: a mathematical characterization of transition-induced blocking with necessary and sufficient conditions, a definition of robust deviations that preserve non-blocking properties, and an algorithm for identifying critical transitions and analyzing system behavior under deletions. Our algorithm reduces computational complexity by leveraging minimal blocking sets, achieving significant reduction in computational requirements. We demonstrate the framework's effectiveness through manufacturing system and autonomous vehicle case studies, showing substantial improvements in identifying critical transitions and predicting potential blocking scenarios across different application domains.
comment: This submission was made prematurely and without obtaining the appropriate permissions from all individuals initially listed. I now recognize that the submission did not meet the standards of authorship or originality expected for preprints. I am withdrawing it out of respect for academic integrity and to ensure that all future work is submitted in accordance with proper ethical guidelines
A Review of Stop-and-Go Traffic Wave Suppression Strategies: Variable Speed Limit vs. Jam-Absorption Driving
The main form of freeway traffic congestion is the familiar stop-and-go wave, characterized by wide moving jams that propagate indefinitely upstream provided enough traffic demand. They cause severe, long-lasting adverse effects, such as reduced traffic efficiency, increased driving risks, and higher vehicle emissions. This underscores the crucial importance of artificial intervention in the propagation of stop-and-go waves. Over the past two decades, two prominent strategies for stop-and-go wave suppression have emerged: variable speed limit (VSL) and jam-absorption driving (JAD). Although they share similar research motivations, objectives, and theoretical foundations, the development of these strategies has remained relatively disconnected. To synthesize fragmented advances and drive the field forward, this paper first provides a comprehensive review of the achievements in the stop-and-go wave suppression-oriented VSL and JAD, respectively. It then focuses on bridging the two areas and identifying research opportunities from the following perspectives: fundamental diagrams, secondary waves, generalizability, traffic state estimation and prediction, robustness to randomness, scenarios for strategy validation, and field tests and practical deployment. We expect that through this review, one area can effectively address its limitations by identifying and leveraging the strengths of the other, thus promoting the overall research goal of freeway stop-and-go wave suppression.
Quality-Aware Hydraulic Control in Drinking Water Networks via Controllability Proxies
The operation of water distribution networks simply aims at efficiently delivering consumers adequate water while maintaining safe water quality (WQ). However, this process entails a multi-scale interplay between hydraulic and WQ dynamics evolving spatio-temporally within such a complex infrastructure network. While prior research has addressed the hydraulic optimization problem and WQ regulation as decoupled or coupled, they often overlook control-theoretic guided solutions. This paper takes a novel approach by investigating the coupling between hydraulic and WQ dynamics from a control networks perspective. We propose a quality-aware control framework that embeds WQ controllability metrics into the network-level pump scheduling problem, acknowledging the direct influence of system hydraulics on WQ controller behavior. We examine the trade-offs between pump control energy cost and WQ performance across various network sizes and scenarios. Our results showcase how network topology, hydraulic constraints, and WQ metrics jointly impact optimal pump schedules and, accordingly, the achievable level of WQ regulation, offering insights into designing efficient control strategies for water infrastructure networks governed by interdependent dynamics.
The Role of Integrity Monitoring in Connected and Automated Vehicles: Current State-of-Practice and Future Directions
Positioning integrity refers to the trust in the performance of a navigation system. Accurate and reliable position information is needed to meet the requirements of connected and Automated Vehicle (CAV) applications, particularly in safety-critical scenarios. Receiver Autonomous Integrity Monitoring (RAIM) and its variants have been widely studied for Global Navigation Satellite System (GNSS)-based vehicle positioning, often fused with kinematic (e.g., Odometry) and perception sensors (e.g., camera). However, integrity monitoring (IM) for cooperative positioning solutions leveraging Vehicle-to-Everything (V2X) communication has received comparatively limited attention. This paper reviews existing research in the field of positioning IM and identifies various research gaps. Particular attention has been placed on identifying research that highlights cooperative IM methods. It also examines key automotive safety standards and public V2X datasets to map current research priorities and uncover critical gaps. Finally, the paper outlines promising future directions, highlighting research topics aimed at advancing and benchmarking positioning integrity.
comment: This work has been submitted to the IEEE for possible publication
On Signed Network Coordination Games
We study binary-action pairwise-separable network games that encompass both coordinating and anti-coordinating behaviors. Our model is grounded in an underlying directed signed graph, where each link is associated with a weight that describes the strenght and nature of the interaction. The utility for each agent is an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent cohesive subset of players, who are either connected exclusively by positive weights, or forms a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Given the properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by a consensus or, respectively, a polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the supermodular properties of coordination games and on a novel use of the concept of graph cohesiveness.
comment: 14 pages, 8 figures
Underapproximating Safe Domains of Attraction for Discrete-Time Systems Using Implicit Representations of Backward Reachable Sets
Analyzing and certifying stability and attractivity of nonlinear systems is a topic of research interest that has been extensively investigated by control theorists and engineers for many years. Despite that, accurately estimating domains of attraction for nonlinear systems remains a challenging task, where available estimation approaches are either conservative or limited to low-dimensional systems. In this work, we propose an iterative approach to accurately underapproximate safe (i.e., state-constrained) domains of attraction for general discrete-time autonomous nonlinear systems. Our approach relies on implicit representations of safe backward reachable sets of safe regions of attraction, where such regions can be be easily constructed using, e.g., quadratic Lyapunov functions. The iterations of our approach are monotonic (in the sense of set inclusion), where each iteration results in a safe region of attraction, given as a sublevel set, that underapproximates the safe domain of attraction. The sublevel set representations of the resulting regions of attraction can be efficiently utilized in verifying the inclusion of given points of interest in the safe domain of attraction. We illustrate our approach through two numerical examples, involving two- and four-dimensional nonlinear systems.
comment: This updated manuscript corrects errors in the formulas for the bounds used in computing ellipsoidal regions of attraction
Robotics
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
We present GrasMolmo, a generalizable open-vocabulary task-oriented grasping (TOG) model. GraspMolmo predicts semantically appropriate, stable grasps conditioned on a natural language instruction and a single RGB-D frame. For instance, given "pour me some tea", GraspMolmo selects a grasp on a teapot handle rather than its body. Unlike prior TOG methods, which are limited by small datasets, simplistic language, and uncluttered scenes, GraspMolmo learns from PRISM, a novel large-scale synthetic dataset of 379k samples featuring cluttered environments and diverse, realistic task descriptions. We fine-tune the Molmo visual-language model on this data, enabling GraspMolmo to generalize to novel open-vocabulary instructions and objects. In challenging real-world evaluations, GraspMolmo achieves state-of-the-art results, with a 70% prediction success on complex tasks, compared to the 35% achieved by the next best alternative. GraspMolmo also successfully demonstrates the ability to predict semantically correct bimanual grasps zero-shot. We release our synthetic dataset, code, model, and benchmarks to accelerate research in task-semantic robotic manipulation, which, along with videos, are available at https://abhaybd.github.io/GraspMolmo/.
A Practical Guide for Incorporating Symmetry in Diffusion Policy
Recently, equivariant neural networks for policy learning have shown promising improvements in sample efficiency and generalization, however, their wide adoption faces substantial barriers due to implementation complexity. Equivariant architectures typically require specialized mathematical formulations and custom network design, posing significant challenges when integrating with modern policy frameworks like diffusion-based models. In this paper, we explore a number of straightforward and practical approaches to incorporate symmetry benefits into diffusion policies without the overhead of full equivariant designs. Specifically, we investigate (i) invariant representations via relative trajectory actions and eye-in-hand perception, (ii) integrating equivariant vision encoders, and (iii) symmetric feature extraction with pretrained encoders using Frame Averaging. We first prove that combining eye-in-hand perception with relative or delta action parameterization yields inherent SE(3)-invariance, thus improving policy generalization. We then perform a systematic experimental study on those design choices for integrating symmetry in diffusion policies, and conclude that an invariant representation with equivariant feature extraction significantly improves the policy performance. Our method achieves performance on par with or exceeding fully equivariant architectures while greatly simplifying implementation.
Seeing, Saying, Solving: An LLM-to-TL Framework for Cooperative Robots
Increased robot deployment, such as in warehousing, has revealed a need for seamless collaboration among heterogeneous robot teams to resolve unforeseen conflicts. To address this challenge, we propose a novel, decentralized framework for robots to request and provide help. The framework begins with robots detecting conflicts using a Vision Language Model (VLM), then reasoning over whether help is needed. If so, it crafts and broadcasts a natural language (NL) help request using a Large Language Model (LLM). Potential helper robots reason over the request and offer help (if able), along with information about impact to their current tasks. Helper reasoning is implemented via an LLM grounded in Signal Temporal Logic (STL) using a Backus-Naur Form (BNF) grammar to guarantee syntactically valid NL-to-STL translations, which are then solved as a Mixed Integer Linear Program (MILP). Finally, the requester robot chooses a helper by reasoning over impact on the overall system. We evaluate our system via experiments considering different strategies for choosing a helper, and find that a requester robot can minimize overall time impact on the system by considering multiple help offers versus simple heuristics (e.g., selecting the nearest robot to help).
Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity
To achieve general-purpose dexterous manipulation, robots must rapidly devise and execute contact-rich behaviors. Existing model-based controllers are incapable of globally optimizing in real-time over the exponential number of possible contact sequences. Instead, recent progress in contact-implicit control has leveraged simpler models that, while still hybrid, make local approximations. However, the use of local models inherently limits the controller to only exploit nearby interactions, potentially requiring intervention to richly explore the space of possible contacts. We present a novel approach which leverages the strengths of local complementarity-based control in combination with low-dimensional, but global, sampling of possible end-effector locations. Our key insight is to consider a contact-free stage preceding a contact-rich stage at every control loop. Our algorithm, in parallel, samples end effector locations to which the contact-free stage can move the robot, then considers the cost predicted by contact-rich MPC local to each sampled location. The result is a globally-informed, contact-implicit controller capable of real-time dexterous manipulation. We demonstrate our controller on precise, non-prehensile manipulation of non-convex objects using a Franka Panda arm. Project page: https://approximating-global-ci-mpc.github.io
comment: S.V. and B.B. contributed equally to this work. Project page: https://approximating-global-ci-mpc.github.io
OPA-Pack: Object-Property-Aware Robotic Bin Packing
Robotic bin packing aids in a wide range of real-world scenarios such as e-commerce and warehouses. Yet, existing works focus mainly on considering the shape of objects to optimize packing compactness and neglect object properties such as fragility, edibility, and chemistry that humans typically consider when packing objects. This paper presents OPA-Pack (Object-Property-Aware Packing framework), the first framework that equips the robot with object property considerations in planning the object packing. Technical-wise, we develop a novel object property recognition scheme with retrieval-augmented generation and chain-of-thought reasoning, and build a dataset with object property annotations for 1,032 everyday objects. Also, we formulate OPA-Net, aiming to jointly separate incompatible object pairs and reduce pressure on fragile objects, while compacting the packing. Further, OPA-Net consists of a property embedding layer to encode the property of candidate objects to be packed, together with a fragility heightmap and an avoidance heightmap to keep track of the packed objects. Then, we design a reward function and adopt a deep Q-learning scheme to train OPA-Net. Experimental results manifest that OPA-Pack greatly improves the accuracy of separating incompatible object pairs (from 52% to 95%) and largely reduces pressure on fragile objects (by 29.4%), while maintaining good packing compactness. Besides, we demonstrate the effectiveness of OPA-Pack on a real packing platform, showcasing its practicality in real-world scenarios.
comment: Submitted to IEEE Transactions on Robotics (TRO) on Feb. 10, 2025
Scalable Importance Sampling in High Dimensions with Low-Rank Mixture Proposals
Importance sampling is a Monte Carlo technique for efficiently estimating the likelihood of rare events by biasing the sampling distribution towards the rare event of interest. By drawing weighted samples from a learned proposal distribution, importance sampling allows for more sample-efficient estimation of rare events or tails of distributions. A common choice of proposal density is a Gaussian mixture model (GMM). However, estimating full-rank GMM covariance matrices in high dimensions is a challenging task due to numerical instabilities. In this work, we propose using mixtures of probabilistic principal component analyzers (MPPCA) as the parametric proposal density for importance sampling methods. MPPCA models are a type of low-rank mixture model that can be fit quickly using expectation-maximization, even in high-dimensional spaces. We validate our method on three simulated systems, demonstrating consistent gains in sample efficiency and quality of failure distribution characterization.
comment: Accepted at CoDIT 2025
Hybrid Voting-Based Task Assignment in Modular Construction Scenarios ICRA 2025
Modular construction, involving off-site prefabrication and on-site assembly, offers significant advantages but presents complex coordination challenges for robotic automation. Effective task allocation is critical for leveraging multi-agent systems (MAS) in these structured environments. This paper introduces the Hybrid Voting-Based Task Assignment (HVBTA) framework, a novel approach to optimizing collaboration between heterogeneous multi-agent construction teams. Inspired by human reasoning in task delegation, HVBTA uniquely integrates multiple voting mechanisms with the capabilities of a Large Language Model (LLM) for nuanced suitability assessment between agent capabilities and task requirements. The framework operates by assigning Capability Profiles to agents and detailed requirement lists called Task Descriptions to construction tasks, subsequently generating a quantitative Suitability Matrix. Six distinct voting methods, augmented by a pre-trained LLM, analyze this matrix to robustly identify the optimal agent for each task. Conflict-Based Search (CBS) is integrated for decentralized, collision-free path planning, ensuring efficient and safe spatio-temporal coordination of the robotic team during assembly operations. HVBTA enables efficient, conflict-free assignment and coordination, facilitating potentially faster and more accurate modular assembly. Current work is evaluating HVBTA's performance across various simulated construction scenarios involving diverse robotic platforms and task complexities. While designed as a generalizable framework for any domain with clearly definable tasks and capabilities, HVBTA will be particularly effective for addressing the demanding coordination requirements of multi-agent collaborative robotics in modular construction due to the predetermined construction planning involved.
comment: Accepted to Block by Block workshop at ICRA 2025
Policy Contrastive Decoding for Robotic Foundation Models
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $\pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $\pi_0$ by 8% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
Composing Dextrous Grasping and In-hand Manipulation via Scoring with a Reinforcement Learning Critic
In-hand manipulation and grasping are fundamental yet often separately addressed tasks in robotics. For deriving in-hand manipulation policies, reinforcement learning has recently shown great success. However, the derived controllers are not yet useful in real-world scenarios because they often require a human operator to place the objects in suitable initial (grasping) states. Finding stable grasps that also promote the desired in-hand manipulation goal is an open problem. In this work, we propose a method for bridging this gap by leveraging the critic network of a reinforcement learning agent trained for in-hand manipulation to score and select initial grasps. Our experiments show that this method significantly increases the success rate of in-hand manipulation without requiring additional training. We also present an implementation of a full grasp manipulation pipeline on a real-world system, enabling autonomous grasping and reorientation even of unwieldy objects.
Investigating Active Sampling for Hardness Classification with Vision-Based Tactile Sensors
One of the most important object properties that humans and robots perceive through touch is hardness. This paper investigates information-theoretic active sampling strategies for sample-efficient hardness classification with vision-based tactile sensors. We evaluate three probabilistic classifier models and two model-uncertainty-based sampling strategies on a robotic setup as well as on a previously published dataset of samples collected by human testers. Our findings indicate that the active sampling approaches, driven by uncertainty metrics, surpass a random sampling baseline in terms of accuracy and stability. Additionally, while in our human study, the participants achieve an average accuracy of 48.00%, our best approach achieves an average accuracy of 88.78% on the same set of objects, demonstrating the effectiveness of vision-based tactile sensors for object hardness classification.
comment: 7 pages
Interpretable Robotic Friction Learning via Symbolic Regression
Accurately modeling the friction torque in robotic joints has long been challenging due to the request for a robust mathematical description. Traditional model-based approaches are often labor-intensive, requiring extensive experiments and expert knowledge, and they are difficult to adapt to new scenarios and dependencies. On the other hand, data-driven methods based on neural networks are easier to implement but often lack robustness, interpretability, and trustworthiness--key considerations for robotic hardware and safety-critical applications such as human-robot interaction. To address the limitations of both approaches, we propose the use of symbolic regression (SR) to estimate the friction torque. SR generates interpretable symbolic formulas similar to those produced by model-based methods while being flexible to accommodate various dynamic effects and dependencies. In this work, we apply SR algorithms to approximate the friction torque using collected data from a KUKA LWR-IV+ robot. Our results show that SR not only yields formulas with comparable complexity to model-based approaches but also achieves higher accuracy. Moreover, SR-derived formulas can be seamlessly extended to include load dependencies and other dynamic factors.
Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning ICML
The goal of offline reinforcement learning (RL) is to extract a high-performance policy from the fixed datasets, minimizing performance degradation due to out-of-distribution (OOD) samples. Offline model-based RL (MBRL) is a promising approach that ameliorates OOD issues by enriching state-action transitions with augmentations synthesized via a learned dynamics model. Unfortunately, seminal offline MBRL methods often struggle in sparse-reward, long-horizon tasks. In this work, we introduce a novel MBRL framework, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates augmented transitions in a temporally structured latent space rather than in raw state space. To model long-horizon behavior, TempDATA learns a latent abstraction that captures a temporal distance from both trajectory and transition levels of state space. Our experiments confirm that TempDATA outperforms previous offline MBRL methods and achieves matching or surpassing the performance of diffusion-based trajectory augmentation and goal-conditioned RL on the D4RL AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen.
comment: 2025 ICML
Constraint-Aware Diffusion Guidance for Robotics: Real-Time Obstacle Avoidance for Autonomous Racing
Diffusion models hold great potential in robotics due to their ability to capture complex, high-dimensional data distributions. However, their lack of constraint-awareness limits their deployment in safety-critical applications. We propose Constraint-Aware Diffusion Guidance (CoDiG), a data-efficient and general-purpose framework that integrates barrier functions into the denoising process, guiding diffusion sampling toward constraint-satisfying outputs. CoDiG enables constraint satisfaction even with limited training data and generalizes across tasks. We evaluate our framework in the challenging setting of miniature autonomous racing, where real-time obstacle avoidance is essential. Real-world experiments show that CoDiG generates safe outputs efficiently under dynamic conditions, highlighting its potential for broader robotic applications. A demonstration video is available at https://youtu.be/KNYsTdtdxOU.
When do Lyapunov Subcenter Manifolds become Eigenmanifolds?
Multi-body mechanical systems have rich internal dynamics, which can be exploited to formulate efficient control targets. For periodic regulation tasks in robotics applications, this motivated the extension of the theory on nonlinear normal modes to Riemannian manifolds, and led to the definition of Eigenmanifolds. This definition is geometric, which is advantageous for generality within robotics but also obscures the connection of Eigenmanifolds to a large body of results from the literature on nonlinear dynamics. We bridge this gap, showing that Eigenmanifolds are instances of Lyapunov subcenter manifolds (LSMs), and that their stronger geometric properties with respect to LSMs follow from a time-symmetry of conservative mechanical systems. This directly leads to local existence and uniqueness results for Eigenmanifolds. Furthermore, we show that an additional spatial symmetry provides Eigenmanifolds with yet stronger properties of Rosenberg manifolds, which can be favorable for control applications, and we present a sufficient condition for their existence and uniqueness. These theoretical results are numerically confirmed on two mechanical systems with a non-constant inertia tensor: a double pendulum and a 5-link pendulum.
comment: 22 pages, 24 figures, submitted to Automatica
Disentangling Coordiante Frames for Task Specific Motion Retargeting in Teleoperation using Shared Control and VR Controllers
Task performance in terms of task completion time in teleoperation is still far behind compared to humans conducting tasks directly. One large identified impact on this is the human capability to perform transformations and alignments, which is directly influenced by the point of view and the motion retargeting strategy. In modern teleoperation systems, motion retargeting is usually implemented through a one time calibration or switching modes. Complex tasks, like concatenated screwing, might be difficult, because the operator has to align (e.g. mirror) rotational and translational input commands. Recent research has shown, that the separation of translation and rotation leads to increased task performance. This work proposes a formal motion retargeting method, which separates translational and rotational input commands. This method is then included in a optimal control based trajectory planner and shown to work on a UR5e manipulator.
comment: 8 pages, 4 figures, conference
Granular Loco-Manipulation: Repositioning Rocks Through Strategic Sand Avalanche
Legged robots have the potential to leverage obstacles to climb steep sand slopes. However, efficiently repositioning these obstacles to desired locations is challenging. Here we present DiffusiveGRAIN, a learning-based method that enables a multi-legged robot to strategically induce localized sand avalanches during locomotion and indirectly manipulate obstacles. We conducted 375 trials, systematically varying obstacle spacing, robot orientation, and leg actions in 75 of them. Results show that the movement of closely-spaced obstacles exhibits significant interference, requiring joint modeling. In addition, different multi-leg excavation actions could cause distinct robot state changes, necessitating integrated planning of manipulation and locomotion. To address these challenges, DiffusiveGRAIN includes a diffusion-based environment predictor to capture multi-obstacle movements under granular flow interferences and a robot state predictor to estimate changes in robot state from multi-leg action patterns. Deployment experiments (90 trials) demonstrate that by integrating the environment and robot state predictors, the robot can autonomously plan its movements based on loco-manipulation goals, successfully shifting closely located rocks to desired locations in over 65% of trials. Our study showcases the potential for a locomoting robot to strategically manipulate obstacles to achieve improved mobility on challenging terrains.
AGI-Elo: How Far Are We From Mastering A Task?
As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unlike existing metrics that focus solely on models, our approach allows for fine-grained, difficulty-aware evaluations through competitive interactions between models and tasks, capturing both the long-tail distribution of real-world challenges and the competency gap between current models and full task mastery. We validate the generalizability and robustness of our system through extensive experiments on multiple established datasets and models across distinct AGI domains. The resulting rating distributions offer novel perspectives and interpretable insights into task difficulty, model progression, and the outstanding challenges that remain on the path to achieving full AGI task mastery.
Practical Equivalence Testing and Its Application in Synthetic Pre-Crash Scenario Validation
The use of representative pre-crash scenarios is critical for assessing the safety impact of driving automation systems through simulation. However, a gap remains in the robust evaluation of the similarity between synthetic and real-world pre-crash scenarios and their crash characteristics. Without proper validation, it cannot be ensured that the synthetic test scenarios adequately represent real-world driving behaviors and crash characteristics. One reason for this validation gap is the lack of focus on methods to confirm that the synthetic test scenarios are practically equivalent to real-world ones, given the assessment scope. Traditional statistical methods, like significance testing, focus on detecting differences rather than establishing equivalence; since failure to detect a difference does not imply equivalence, they are of limited applicability for validating synthetic pre-crash scenarios and crash characteristics. This study addresses this gap by proposing an equivalence testing method based on the Bayesian Region of Practical Equivalence (ROPE) framework. This method is designed to assess the practical equivalence of scenario characteristics that are most relevant for the intended assessment, making it particularly appropriate for the domain of virtual safety assessments. We first review existing equivalence testing methods. Then we propose and demonstrate the Bayesian ROPE-based method by testing the equivalence of two rear-end pre-crash datasets. Our approach focuses on the most relevant scenario characteristics. Our analysis provides insights into the practicalities and effectiveness of equivalence testing in synthetic test scenario validation and demonstrates the importance of testing for improving the credibility of synthetic data for automated vehicle safety assessment, as well as the credibility of subsequent safety impact assessments.
MOON: Multi-Objective Optimization-Driven Object-Goal Navigation Using a Variable-Horizon Set-Orienteering Planner
Object-goal navigation (ON) enables autonomous robots to locate and reach user-specified objects in previously unknown environments, offering promising applications in domains such as assistive care and disaster response. Existing ON methods -- including training-free approaches, reinforcement learning, and zero-shot planners -- generally depend on active exploration to identify landmark objects (e.g., kitchens or desks), followed by navigation toward semantically related targets (e.g., a specific mug). However, these methods often lack strategic planning and do not adequately address trade-offs among multiple objectives. To overcome these challenges, we propose a novel framework that formulates ON as a multi-objective optimization problem (MOO), balancing frontier-based knowledge exploration with knowledge exploitation over previously observed landmarks; we call this framework MOON (MOO-driven ON). We implement a prototype MOON system that integrates three key components: (1) building on QOM [IROS05], a classical ON system that compactly and discriminatively encodes landmarks based on their semantic relevance to the target; (2) integrating StructNav [RSS23], a recently proposed training-free planner, to enhance the navigation pipeline; and (3) introducing a variable-horizon set orienteering problem formulation to enable global optimization over both exploration and exploitation strategies. This work represents an important first step toward developing globally optimized, next-generation object-goal navigation systems.
comment: 6 pages, technical report
TeleOpBench: A Simulator-Centric Benchmark for Dual-Arm Dexterous Teleoperation
Teleoperation is a cornerstone of embodied-robot learning, and bimanual dexterous teleoperation in particular provides rich demonstrations that are difficult to obtain with fully autonomous systems. While recent studies have proposed diverse hardware pipelines-ranging from inertial motion-capture gloves to exoskeletons and vision-based interfaces-there is still no unified benchmark that enables fair, reproducible comparison of these systems. In this paper, we introduce TeleOpBench, a simulator-centric benchmark tailored to bimanual dexterous teleoperation. TeleOpBench contains 30 high-fidelity task environments that span pick-and-place, tool use, and collaborative manipulation, covering a broad spectrum of kinematic and force-interaction difficulty. Within this benchmark we implement four representative teleoperation modalities-(i) MoCap, (ii) VR device, (iii) arm-hand exoskeletons, and (iv) monocular vision tracking-and evaluate them with a common protocol and metric suite. To validate that performance in simulation is predictive of real-world behavior, we conduct mirrored experiments on a physical dual-arm platform equipped with two 6-DoF dexterous hands. Across 10 held-out tasks we observe a strong correlation between simulator and hardware performance, confirming the external validity of TeleOpBench. TeleOpBench establishes a common yardstick for teleoperation research and provides an extensible platform for future algorithmic and hardware innovation.
comment: 13 pages
DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories
We introduce DreamGen, a simple yet highly effective 4-stage pipeline for training robot policies that generalize across behaviors and environments through neural trajectories - synthetic robot data generated from video world models. DreamGen leverages state-of-the-art image-to-video generative models, adapting them to the target robot embodiment to produce photorealistic synthetic videos of familiar or novel tasks in diverse environments. Since these models generate only videos, we recover pseudo-action sequences using either a latent action model or an inverse-dynamics model (IDM). Despite its simplicity, DreamGen unlocks strong behavior and environment generalization: a humanoid robot can perform 22 new behaviors in both seen and unseen environments, while requiring teleoperation data from only a single pick-and-place task in one environment. To evaluate the pipeline systematically, we introduce DreamGen Bench, a video generation benchmark that shows a strong correlation between benchmark performance and downstream policy success. Our work establishes a promising new axis for scaling robot learning well beyond manual data collection.
comment: See website for videos: https://research.nvidia.com/labs/gear/dreamgen
Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion
Humanoid soccer dribbling is a highly challenging task that demands dexterous ball manipulation while maintaining dynamic balance. Traditional rule-based methods often struggle to achieve accurate ball control due to their reliance on fixed walking patterns and limited adaptability to real-time ball dynamics. To address these challenges, we propose a two-stage curriculum learning framework that enables a humanoid robot to acquire dribbling skills without explicit dynamics or predefined trajectories. In the first stage, the robot learns basic locomotion skills; in the second stage, we fine-tune the policy for agile dribbling maneuvers. We further introduce a virtual camera model in simulation and design heuristic rewards to encourage active sensing, promoting a broader visual range for continuous ball perception. The policy is trained in simulation and successfully transferred to a physical humanoid robot. Experimental results demonstrate that our method enables effective ball manipulation, achieving flexible and visually appealing dribbling behaviors across multiple environments. This work highlights the potential of reinforcement learning in developing agile humanoid soccer robots. Additional details, video demonstrations, and code are available at https://zhuoheng0910.github.io/dribble-master/.
Audio-Visual Contact Classification for Tree Structures in Agriculture
Contact-rich manipulation tasks in agriculture, such as pruning and harvesting, require robots to physically interact with tree structures to maneuver through cluttered foliage. Identifying whether the robot is contacting rigid or soft materials is critical for the downstream manipulation policy to be safe, yet vision alone is often insufficient due to occlusion and limited viewpoints in this unstructured environment. To address this, we propose a multi-modal classification framework that fuses vibrotactile (audio) and visual inputs to identify the contact class: leaf, twig, trunk, or ambient. Our key insight is that contact-induced vibrations carry material-specific signals, making audio effective for detecting contact events and distinguishing material types, while visual features add complementary semantic cues that support more fine-grained classification. We collect training data using a hand-held sensor probe and demonstrate zero-shot generalization to a robot-mounted probe embodiment, achieving an F1 score of 0.82. These results underscore the potential of audio-visual learning for manipulation in unstructured, contact-rich environments.
comment: 8 pages
Digital Twins in the Cloud: A Modular, Scalable and Interoperable Framework for Accelerating Verification and Validation of Autonomous Driving Solutions
Verification and validation (V&V) of autonomous vehicles (AVs) typically requires exhaustive testing across a variety of operating environments and driving scenarios including rare, extreme, or hazardous situations that might be difficult or impossible to capture in reality. Additionally, physical V&V methods such as track-based evaluations or public-road testing are often constrained by time, cost, and safety, which motivates the need for virtual proving grounds. However, the fidelity and scalability of simulation-based V&V methods can quickly turn into a bottleneck. In such a milieu, this work proposes a virtual proving ground that flexibly scales digital twins within high-performance computing clusters (HPCCs) and automates the V&V process. Here, digital twins enable high-fidelity virtual representation of the AV and its operating environments, allowing extensive scenario-based testing. Meanwhile, HPCC infrastructure brings substantial advantages in terms of computational power and scalability, enabling rapid iterations of simulations, processing and storage of massive amounts of data, and deployment of large-scale test campaigns, thereby reducing the time and cost associated with the V&V process. We demonstrate the efficacy of this approach through a case study that focuses on the variability analysis of a candidate autonomy algorithm to identify potential vulnerabilities in its perception, planning, and control sub-systems. The modularity, scalability, and interoperability of the proposed framework are demonstrated by deploying a test campaign comprising 256 test cases on two different HPCC architectures to ensure continuous operation in a publicly shared resource setting. The findings highlight the ability of the proposed framework to accelerate and streamline the V&V process, thereby significantly compressing (~30x) the timeline.
comment: Accepted at ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference (IDETC-CIE) 2025
The Robot of Theseus: A modular robotic testbed for legged locomotion
Robotic models are useful for independently varying specific features, but most quadrupedal robots differ so greatly from animal morphologies that they have minimal biomechanical relevance. Commercially available quadrupedal robots are also prohibitively expensive for biological research programs and difficult to customize. Here, we present a low-cost quadrupedal robot with modular legs that can match a wide range of animal morphologies for biomechanical hypothesis testing. The Robot Of Theseus (TROT) costs approximately $4000 to build out of 3D printed parts and standard off-the-shelf supplies. Each limb consists of 2 or 3 rigid links; the proximal joint can be rotated to become a knee or elbow. Telescoping mechanisms vary the length of each limb link. The open-source software accommodates user-defined gaits and morphology changes. Effective leg length, or crouch, is determined by the four-bar linkage actuating each joint. The backdrivable motors can vary virtual spring stiffness and range of motion. Full descriptions of the TROT hardware and software are freely available online. We demonstrate the use of TROT to compare locomotion among extant, extinct, and theoretical morphologies. In addition to biomechanical hypothesis testing, we envision a variety of different applications for this low-cost, modular, legged robotic platform, including developing novel control strategies, clearing land mines, or remote exploration. All CAD and code is available for download on the TROT project page.
SafeMove-RL: A Certifiable Reinforcement Learning Framework for Dynamic Motion Constraints in Trajectory Planning
This study presents a dynamic safety margin-based reinforcement learning framework for local motion planning in dynamic and uncertain environments. The proposed planner integrates real-time trajectory optimization with adaptive gap analysis, enabling effective feasibility assessment under partial observability constraints. To address safety-critical computations in unknown scenarios, an enhanced online learning mechanism is introduced, which dynamically corrects spatial trajectories by forming dynamic safety margins while maintaining control invariance. Extensive evaluations, including ablation studies and comparisons with state-of-the-art algorithms, demonstrate superior success rates and computational efficiency. The framework's effectiveness is further validated on both simulated and physical robotic platforms.
MSCEKF-MIO: Magnetic-Inertial Odometry Based on Multi-State Constraint Extended Kalman Filter
To overcome the limitation of existing indoor odometry technologies which often cannot simultaneously meet requirements for accuracy cost-effectiveness, and robustness-this paper proposes a novel magnetometer array-aided inertial odometry approach, MSCEKF-MIO (Multi-State Constraint Extended Kalman Filter-based Magnetic-Inertial Odometry). We construct a magnetic field model by fitting measurements from the magnetometer array and then use temporal variations in this model-extracted from continuous observations-to estimate the carrier's absolute velocity. Furthermore, we implement the MSCEKF framework to fuse observed magnetic field variations with position and attitude estimates from inertial navigation system (INS) integration, thereby enabling autonomous, high-precision indoor relative positioning. Experimental results demonstrate that the proposed algorithm achieves superior velocity estimation accuracy and horizontal positioning precision relative to state-of-the-art magnetic array-aided INS algorithms (MAINS). On datasets with trajectory lengths of 150-250m, the proposed method yields an average horizontal position RMSE of approximately 2.5m. In areas with distinctive magnetic features, the magneto-inertial odometry achieves a velocity estimation accuracy of 0.07m/s. Consequently, the proposed method offers a novel positioning solution characterized by low power consumption, cost-effectiveness, and high reliability in complex indoor environments.
comment: 10 pages
EndoForce: Development of an Intuitive Axial Force Measurement Device for Endoscopic Robotic Systems
Robotic endoscopic systems provide intuitive control and eliminate radiation exposure, making them a promising alternative to conventional methods. However, the lack of axial force measurement from the robot remains a major challenge, as it can lead to excessive colonic elongation, perforation, or ureteral complications. Although various methods have been proposed in previous studies, limitations such as model dependency, bulkiness, and environmental sensitivity remain challenges that should be addressed before clinical application. In this study, we propose EndoForce, a device designed for intuitive and accurate axial force measurement in endoscopic robotic systems. Inspired by the insertion motion performed by medical doctors during ureteroscopy and gastrointestinal (GI) endoscopy, EndoForce ensures precise force measuring while maintaining compatibility with clinical environments. The device features a streamlined design, allowing for the easy attachment and detachment of a sterile cover, and incorporates a commercial load cell to enhance cost-effectiveness and facilitate practical implementation in real medical applications. To validate the effectiveness of the proposed EndoForce, physical experiments were performed using a testbed that simulates the ureter. We show that the axial force generated during insertion was measured with high accuracy, regardless of whether the pathway was straight or curved, in a testbed simulating the human ureter.
A Comprehensive Survey on Physical Risk Control in the Era of Foundation Model-enabled Robotics IJCAI 2025
Recent Foundation Model-enabled robotics (FMRs) display greatly improved general-purpose skills, enabling more adaptable automation than conventional robotics. Their ability to handle diverse tasks thus creates new opportunities to replace human labor. However, unlike general foundation models, FMRs interact with the physical world, where their actions directly affect the safety of humans and surrounding objects, requiring careful deployment and control. Based on this proposition, our survey comprehensively summarizes robot control approaches to mitigate physical risks by covering all the lifespan of FMRs ranging from pre-deployment to post-accident stage. Specifically, we broadly divide the timeline into the following three phases: (1) pre-deployment phase, (2) pre-incident phase, and (3) post-incident phase. Throughout this survey, we find that there is much room to study (i) pre-incident risk mitigation strategies, (ii) research that assumes physical interaction with humans, and (iii) essential issues of foundation models themselves. We hope that this survey will be a milestone in providing a high-resolution analysis of the physical risks of FMRs and their control, contributing to the realization of a good human-robot relationship.
comment: Accepted to IJCAI 2025 Survey Track
From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance.
SayCoNav: Utilizing Large Language Models for Adaptive Collaboration in Decentralized Multi-Robot Navigation
Adaptive collaboration is critical to a team of autonomous robots to perform complicated navigation tasks in large-scale unknown environments. An effective collaboration strategy should be determined and adapted according to each robot's skills and current status to successfully achieve the shared goal. We present SayCoNav, a new approach that leverages large language models (LLMs) for automatically generating this collaboration strategy among a team of robots. Building on the collaboration strategy, each robot uses the LLM to generate its plans and actions in a decentralized way. By sharing information to each other during navigation, each robot also continuously updates its step-by-step plans accordingly. We evaluate SayCoNav on Multi-Object Navigation (MultiON) tasks, that require the team of the robots to utilize their complementary strengths to efficiently search multiple different objects in unknown environments. By validating SayCoNav with varied team compositions and conditions against baseline methods, our experimental results show that SayCoNav can improve search efficiency by at most 44.28% through effective collaboration among heterogeneous robots. It can also dynamically adapt to the changing conditions during task execution.
Practice Makes Perfect: A Study of Digital Twin Technology for Assembly and Problem-solving using Lunar Surface Telerobotics
Robotic systems that can traverse planetary or lunar surfaces to collect environmental data and perform physical manipulation tasks, such as assembling equipment or conducting mining operations, are envisioned to form the backbone of future human activities in space. However, the environmental conditions in which these robots, or "rovers," operate present challenges toward achieving fully autonomous solutions, meaning that rover missions will require some degree of human teleoperation or supervision for the foreseeable future. As a result, human operators require training to successfully direct rovers and avoid costly errors or mission failures, as well as the ability to recover from any issues that arise on the fly during mission activities. While analog environments, such as JPL's Mars Yard, can help with such training by simulating surface environments in the real world, access to such resources may be rare and expensive. As an alternative or supplement to such physical analogs, we explore the design and evaluation of a virtual reality digital twin system to train human teleoperation of robotic rovers with mechanical arms for space mission activities. We conducted an experiment with 24 human operators to investigate how our digital twin system can support human teleoperation of rovers in both pre-mission training and in real-time problem solving in a mock lunar mission in which users directed a physical rover in the context of deploying dipole radio antennas. We found that operators who first trained with the digital twin showed a 28% decrease in mission completion time, an 85% decrease in unrecoverable errors, as well as improved mental markers, including decreased cognitive load and increased situation awareness.
Dynamic Bipedal MPC with Foot-level Obstacle Avoidance and Adjustable Step Timing
Collision-free planning is essential for bipedal robots operating within unstructured environments. This paper presents a real-time Model Predictive Control (MPC) framework that addresses both body and foot avoidance for dynamic bipedal robots. Our contribution is two-fold: we introduce (1) a novel formulation for adjusting step timing to facilitate faster body avoidance and (2) a novel 3D foot-avoidance formulation that implicitly selects swing trajectories and footholds that either steps over or navigate around obstacles with awareness of Center of Mass (COM) dynamics. We achieve body avoidance by applying a half-space relaxation of the safe region but introduce a switching heuristic based on tracking error to detect a need to change foot-timing schedules. To enable foot avoidance and viable landing footholds on all sides of foot-level obstacles, we decompose the non-convex safe region on the ground into several convex polygons and use Mixed-Integer Quadratic Programming to determine the optimal candidate. We found that introducing a soft minimum-travel-distance constraint is effective in preventing the MPC from being trapped in local minima that can stall half-space relaxation methods behind obstacles. We demonstrated the proposed algorithms on multibody simulations on the bipedal robot platforms, Cassie and Digit, as well as hardware experiments on Digit.
Risk-Averse Traversal of Graphs with Stochastic and Correlated Edge Costs for Safe Global Planetary Mobility
In robotic planetary surface exploration, strategic mobility planning is an important task that involves finding candidate long-distance routes on orbital maps and identifying segments with uncertain traversability. Then, expert human operators establish safe, adaptive traverse plans based on the actual navigation difficulties encountered in these uncertain areas. In this paper, we formalize this challenge as a new, risk-averse variant of the Canadian Traveller Problem (CTP) tailored to global planetary mobility. The objective is to find a traverse policy minimizing a conditional value-at-risk (CVaR) criterion, which is a risk measure with an intuitive interpretation. We propose a novel search algorithm that finds exact CVaR-optimal policies. Our approach leverages well-established optimal AND-OR search techniques intended for (risk-agnostic) expectation minimization and extends these methods to the risk-averse domain. We validate our approach through simulated long-distance planetary surface traverses; we employ real orbital maps of the Martian surface to construct problem instances and use terrain maps to express traversal probabilities in uncertain regions. Our results illustrate different adaptive decision-making schemes depending on the level of risk aversion. Additionally, our problem setup allows accounting for traversability correlations between similar areas of the environment. In such a case, we empirically demonstrate how information-seeking detours can mitigate risk.
comment: Submitted to the Autonomous Robots journal
GeoVLM: Improving Automated Vehicle Geolocalisation Using Vision-Language Matching
Cross-view geo-localisation identifies coarse geographical position of an automated vehicle by matching a ground-level image to a geo-tagged satellite image from a database. Despite the advancements in Cross-view geo-localisation, significant challenges still persist such as similar looking scenes which makes it challenging to find the correct match as the top match. Existing approaches reach high recall rates but they still fail to rank the correct image as the top match. To address this challenge, this paper proposes GeoVLM, a novel approach which uses the zero-shot capabilities of vision language models to enable cross-view geo-localisation using interpretable cross-view language descriptions. GeoVLM is a trainable reranking approach which improves the best match accuracy of cross-view geo-localisation. GeoVLM is evaluated on standard benchmark VIGOR and University-1652 and also through real-life driving environments using Cross-View United Kingdom, a new benchmark dataset introduced in this paper. The results of the paper show that GeoVLM improves retrieval performance of cross-view geo-localisation compared to the state-of-the-art methods with the help of explainable natural language descriptions. The code is available at https://github.com/CAV-Research-Lab/GeoVLM
Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation
Coordinated multi-arm manipulation requires satisfying multiple simultaneous geometric constraints across high-dimensional configuration spaces, which poses a significant challenge for traditional planning and control methods. In this work, we propose Adaptive Diffusion Constrained Sampling (ADCS), a generative framework that flexibly integrates both equality (e.g., relative and absolute pose constraints) and structured inequality constraints (e.g., proximity to object surfaces) into an energy-based diffusion model. Equality constraints are modeled using dedicated energy networks trained on pose differences in Lie algebra space, while inequality constraints are represented via Signed Distance Functions (SDFs) and encoded into learned constraint embeddings, allowing the model to reason about complex spatial regions. A key innovation of our method is a Transformer-based architecture that learns to weight constraint-specific energy functions at inference time, enabling flexible and context-aware constraint integration. Moreover, we adopt a two-phase sampling strategy that improves precision and sample diversity by combining Langevin dynamics with resampling and density-aware re-weighting. Experimental results on dual-arm manipulation tasks show that ADCS significantly improves sample diversity and generalization across settings demanding precise coordination and adaptive constraint handling.
Learning Collision Risk from Naturalistic Driving with Generalised Surrogate Safety Measures
Accurate and timely alerts for drivers or automated systems to unfolding collisions remains a challenge in road safety, particularly in highly interactive urban traffic. Existing approaches require labour-intensive annotation of sparse risk, struggle to consider varying interaction context, or are useful only in the scenarios they are designed for. To address these limits, this study introduces the generalised surrogate safety measure (GSSM), a new approach that learns exclusively from naturalistic driving without crash or risk labels. GSSM captures the patterns of normal driving and estimates the extent to which a traffic interaction deviates from the norm towards unsafe extreme. Utilising neural networks, normal interactions are characterised by context-conditioned distributions of multi-directional spacing between road users. In the same interaction context, a spacing closer than normal entails higher risk of potential collision. Then a context-adaptive risk score and its associated probability can be calculated based on the theory of extreme values. Any measurable factors, such as motion kinematics, weather, lighting, can serve as part of the context, allowing for diverse coverage of safety-critical interactions. Multiple public driving datasets are used to train GSSMs, which are tested with 4,875 real-world crashes and near-crashes reconstructed from the SHRP2 NDS. A vanilla GSSM using only instantaneous states achieves AUPRC of 0.9 and secures a median time advance of 2.6 seconds to prevent potential collisions. Additional data and contextual factors provide further performance gains. Across various interaction types such as rear-end, merging, and crossing, the accuracy and timeliness of GSSM consistently outperforms existing baselines. GSSM therefore establishes a scalable, context-aware, and generalisable foundation to proactively quantify collision risk in traffic interactions.
comment: 18 pages, 8 figures
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
Robot learning in high-dimensional control settings, such as humanoid locomotion, presents persistent challenges for reinforcement learning (RL) algorithms due to unstable dynamics, complex contact interactions, and sensitivity to distributional shifts during training. Model-based methods, \textit{e.g.}, Temporal-Difference Model Predictive Control (TD-MPC), have demonstrated promising results by combining short-horizon planning with value-based learning, enabling efficient solutions for basic locomotion tasks. However, these approaches remain ineffective in addressing policy mismatch and instability introduced by off-policy updates. Thus, in this work, we introduce Temporal-Difference Group Relative Policy Constraint (TD-GRPC), an extension of the TD-MPC framework that unifies Group Relative Policy Optimization (GRPO) with explicit Policy Constraints (PC). TD-GRPC applies a trust-region constraint in the latent policy space to maintain consistency between the planning priors and learned rollouts, while leveraging group-relative ranking to assess and preserve the physical feasibility of candidate trajectories. Unlike prior methods, TD-GRPC achieves robust motions without modifying the underlying planner, enabling flexible planning and policy learning. We validate our method across a locomotion task suite ranging from basic walking to highly dynamic movements on the 26-DoF Unitree H1-2 humanoid robot. Through simulation results, TD-GRPC demonstrates its improvements in stability and policy robustness with sampling efficiency while training for complex humanoid control tasks.
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, with a focus on physical common sense and embodied reasoning. To represent physical common sense, we use a hierarchical ontology that captures fundamental knowledge about space, time, and physics. For embodied reasoning, we rely on a two-dimensional ontology that generalizes across different physical embodiments. Building on these capabilities, we develop two multimodal large language models, Cosmos-Reason1-7B and Cosmos-Reason1-56B. We curate data and train our models in two stages: Physical AI supervised fine-tuning (SFT) and Physical AI reinforcement learning (RL). To evaluate our models, we build comprehensive benchmarks for physical common sense and embodied reasoning according to our ontologies. Evaluation results show that Physical AI SFT and RL bring significant improvements. To facilitate the development of Physical AI, we make our code and pre-trained models available under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-reason1.
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
Robot task planning decomposes human instructions into executable action sequences that enable robots to complete a series of complex tasks. Although recent large language model (LLM)-based task planners achieve amazing performance, they assume that human instructions are clear and straightforward. However, real-world users are not experts, and their instructions to robots often contain significant vagueness. Linguists suggest that such vagueness frequently arises from referring expressions (REs), whose meanings depend heavily on dialogue context and environment. This vagueness is even more prevalent among the elderly and children, who robots should serve more. This paper studies how such vagueness in REs within human instructions affects LLM-based robot task planning and how to overcome this issue. To this end, we propose the first robot task planning benchmark with vague REs (REI-Bench), where we discover that the vagueness of REs can severely degrade robot planning performance, leading to success rate drops of up to 77.9%. We also observe that most failure cases stem from missing objects in planners. To mitigate the REs issue, we propose a simple yet effective approach: task-oriented context cognition, which generates clear instructions for robots, achieving state-of-the-art performance compared to aware prompt and chains of thought. This work contributes to the research community of human-robot interaction (HRI) by making robot task planning more practical, particularly for non-expert users, e.g., the elderly and children.
comment: Under Review
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
Rapid and Inexpensive Inertia Tensor Estimation from a Single Object Throw
The inertia tensor is an important parameter in many engineering fields, but measuring it can be cumbersome and involve multiple experiments or accurate and expensive equipment. We propose a method to measure the moment of inertia tensor of a rigid body from a single spinning throw, by attaching a small and inexpensive stand-alone measurement device consisting of a gyroscope, accelerometer and a reaction wheel. The method includes a compensation for the increase of moment of inertia due to adding the measurement device to the body, and additionally obtains the location of the centre of gravity of the body as an intermediate result. Experiments performed with known rigid bodies show that the mean accuracy is around 2%.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Quantifying Context Bias in Domain Adaptation for Object Detection
Domain adaptation for object detection (DAOD) seeks to transfer a trained model from a source to a target domain. Various DAOD methods exist, some of which aim to minimize context bias between foreground-background associations in various domains. However, no prior work has studied context bias in DAOD by analyzing changes in background features during adaptation and how context bias is represented in different domains. Our research experiment highlights the potential usability of context bias in DAOD. We address the problem by varying activation values over different layers of two different trained models, Detectron2 and YOLOv11, and by masking the background, both of which impact the number and quality of detections. We use two synthetic datasets, CARLA and Virtual KITTI, and two different versions of real open-source data, Cityscapes and KITTI semantic, as separate domains to represent and quantify context bias. We utilize different metrics such as Maximum Mean Discrepancy (MMD) and Maximum Variance Discrepancy (MVD) to find the layer-specific conditional probability estimates of foreground given manipulated background regions for separate domains. We further analyze foreground-background associations across various dataset combinations. We find that state-of-the-art domain adaptation methods exhibit some form of context bias and apply a potentially simple way to alleviate the context bias achieving improved accuracy (from 51.189 to 53.646 mAP on Cityscapes foggy validation with 63.207 mAP and 64.233 mAP on Cityscapes validation respectively). We demonstrate through detailed analysis that understanding of the context bias can affect DAOD approach and focusing solely on aligning foreground features is insufficient for effective DAOD.
comment: Under review
FlightBench: Benchmarking Learning-based Methods for Ego-vision-based Quadrotors Navigation
Ego-vision-based navigation in cluttered environments is crucial for mobile systems, particularly agile quadrotors. While learning-based methods have shown promise recently, head-to-head comparisons with cutting-edge optimization-based approaches are scarce, leaving open the question of where and to what extent they truly excel. In this paper, we introduce FlightBench, the first comprehensive benchmark that implements various learning-based methods for ego-vision-based navigation and evaluates them against mainstream optimization-based baselines using a broad set of performance metrics. More importantly, we develop a suite of criteria to assess scenario difficulty and design test cases that span different levels of difficulty based on these criteria. Our results show that while learning-based methods excel in high-speed flight and faster inference, they struggle with challenging scenarios like sharp corners or view occlusion. Analytical experiments validate the correlation between our difficulty criteria and flight performance. Moreover, we verify the trend in flight performance within real-world environments through full-pipeline and hardware-in-the-loop experiments. We hope this benchmark and these criteria will drive future advancements in learning-based navigation for ego-vision quadrotors. Code and documentation are available at https://github.com/thu-uav/FlightBench.
comment: The first three authors contribute equally
Quantization-Free Autoregressive Action Transformer
Current transformer-based imitation learning approaches introduce discrete action representations and train an autoregressive transformer decoder on the resulting latent code. However, the initial quantization breaks the continuous structure of the action space thereby limiting the capabilities of the generative model. We propose a quantization-free method instead that leverages Generative Infinite-Vocabulary Transformers (GIVT) as a direct, continuous policy parametrization for autoregressive transformers. This simplifies the imitation learning pipeline while achieving state-of-the-art performance on a variety of popular simulated robotics tasks. We enhance our policy roll-outs by carefully studying sampling algorithms, further improving the results.
Infinigen-Sim: Procedural Generation of Articulated Simulation Assets
We introduce Infinigen-Sim, a toolkit which enables users to create diverse and realistic articulated object procedural generators. These tools are composed of high-level utilities for use creating articulated assets in Blender, as well as an export pipeline to integrate the resulting assets into common robotics simulators. We demonstrate our system by creating procedural generators for 5 common articulated object categories. Experiments show that assets sampled from these generators are useful for movable object segmentation, training generalizable reinforcement learning policies, and sim-to-real transfer of imitation learning policies.
FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots
Reinforcement learning (RL) has made significant strides in legged robot control, enabling locomotion across diverse terrains and complex loco-manipulation capabilities. However, the commonly used position or velocity tracking-based objectives are agnostic to forces experienced by the robot, leading to stiff and potentially dangerous behaviors and poor control during forceful interactions. To address this limitation, we present \emph{Force-Adaptive Control via Impedance Reference Tracking} (FACET). Inspired by impedance control, we use RL to train a control policy to imitate a virtual mass-spring-damper system, allowing fine-grained control under external forces by manipulating the virtual spring. In simulation, we demonstrate that our quadruped robot achieves improved robustness to large impulses (up to 200 Ns) and exhibits controllable compliance, achieving an 80% reduction in collision impulse. The policy is deployed to a physical robot to showcase both compliance and the ability to engage with large forces by kinesthetic control and pulling payloads up to 2/3 of its weight. Further extension to a legged loco-manipulator and a humanoid shows the applicability of our method to more complex settings to enable whole-body compliance control. Project Website: https://facet.pages.dev/
Efficient Multi-robot Active SLAM
Autonomous exploration in unknown environments remains a fundamental challenge in robotics, particularly for applications such as search and rescue, industrial inspection, and planetary exploration. Multi-robot active SLAM presents a promising solution by enabling collaborative mapping and exploration while actively reducing uncertainty. However, existing approaches often suffer from high computational costs and inefficient frontier management, making them computationally expensive for real-time applications. In this paper, we introduce an efficient multi-robot active SLAM framework that incorporates a frontier-sharing strategy to enhance robot distribution in unexplored environments. Our approach integrates a utility function that considers both pose graph uncertainty and path entropy, achieving an optimal balance between exploration coverage and computational efficiency. By filtering and prioritizing goal frontiers, our method significantly reduces computational overhead while preserving high mapping accuracy. The proposed framework has been implemented in ROS and validated through simulations and real-world experiments. Results demonstrate superior exploration performance and mapping quality compared to state-of-the-art approaches.
comment: 31 pages, 15 figures
Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the accuracy of vision-based occupancy predictions. OccFiner operates in two hybrid phases: 1) a multi-to-multi local propagation network that implicitly aligns and processes multiple local frames for correcting onboard model errors and consistently enhancing occupancy accuracy across all distances. 2) the region-centric global propagation, focuses on refining labels using explicit multi-view geometry and integrating sensor bias, particularly for increasing the accuracy of distant occupied voxels. Extensive experiments demonstrate that OccFiner improves both geometric and semantic accuracy across various types of coarse occupancy, setting a new state-of-the-art performance on the SemanticKITTI dataset. Notably, OccFiner significantly boosts the performance of vision-based SSC models, achieving accuracy levels competitive with established LiDAR-based onboard SSC methods. Furthermore, OccFiner is the first to achieve automatic annotation of SSC in a purely vision-based approach. Quantitative experiments prove that OccFiner successfully facilitates occupancy data loop-closure in autonomous driving. Additionally, we quantitatively and qualitatively validate the superiority of the offboard approach on city-level SSC static maps. The source code will be made publicly available at https://github.com/MasterHow/OccFiner.
comment: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code will be made publicly available at https://github.com/MasterHow/OccFiner
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation. Therefore, we propose DexGarmentLab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation, which features large-scale high-quality 3D assets for 15 task scenarios, and refines simulation techniques tailored for garment modeling to reduce the sim-to-real gap. Previous data collection typically relies on teleoperation or training expert reinforcement learning (RL) policies, which are labor-intensive and inefficient. In this paper, we leverage garment structural correspondence to automatically generate a dataset with diverse trajectories using only a single expert demonstration, significantly reducing manual intervention. However, even extensive demonstrations cannot cover the infinite states of garments, which necessitates the exploration of new algorithms. To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO). It first identifies transferable affordance points to accurately locate the manipulation area, then generates generalizable trajectories to complete the task. Through extensive experiments and detailed analysis of our method and baseline, we demonstrate that HALO consistently outperforms existing methods, successfully generalizing to previously unseen instances even with significant variations in shape and deformation where others fail. Our project page is available at: https://wayrise.github.io/DexGarmentLab/.
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Generative skill acquisition enables embodied agents to actively learn a scalable and evolving repertoire of control skills, crucial for the advancement of large decision models. While prior approaches often rely on supervision signals from generalist agents (e.g., LLMs), their effectiveness in complex 3D environments remains unclear; exhaustive evaluation incurs substantial computational costs, significantly hindering the efficiency of skill learning. Inspired by recent successes in verification models for mathematical reasoning, we propose VERGSA (Verifying Embodied Reasoning in Generative Skill Acquisition), a framework that systematically integrates real-time verification principles into embodied skill learning. VERGSA establishes 1) a seamless extension from verification of mathematical reasoning into embodied learning by dynamically incorporating contextually relevant tasks into prompts and defining success metrics for both subtasks and overall tasks, and 2) an automated, scalable reward labeling scheme that synthesizes dense reward signals by iteratively finalizing the contribution of scene configuration and subtask learning to overall skill acquisition. To the best of our knowledge, this approach constitutes the first comprehensive training dataset for verification-driven generative skill acquisition, eliminating arduous manual reward engineering. Experiments validate the efficacy of our approach: 1) the exemplar task pool improves the average task success rates by 21%, 2) our verification model boosts success rates by 24% for novel tasks and 36% for encountered tasks, and 3) outperforms LLM-as-a-Judge baselines in verification quality.
Loop closure grasping: Topological transformations enable strong, gentle, and versatile grasps
Grasping mechanisms must both create and subsequently hold grasps that permit safe and effective object manipulation. Existing mechanisms address the different functional requirements of grasp creation and grasp holding using a single morphology, but have yet to achieve the simultaneous strength, gentleness, and versatility needed for many applications. We present "loop closure grasping", a class of robotic grasping that addresses these different functional requirements through topological transformations between open-loop and closed-loop morphologies. We formalize these morphologies for grasping, formulate the loop closure grasping method, and present principles and a design architecture that we implement using soft growing inflated beams, winches, and clamps. The mechanisms' initial open-loop topology enables versatile grasp creation via unencumbered tip movement, and closing the loop enables strong and gentle holding with effectively infinite bending compliance. Loop closure grasping circumvents the tradeoffs of single-morphology designs, enabling grasps involving historically challenging objects, environments, and configurations.
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
In this paper, we claim that spatial understanding is the keypoint in robot manipulation, and propose SpatialVLA to explore effective spatial representations for the robot foundation model. Specifically, we introduce Ego3D Position Encoding to inject 3D information into the input observations of the visual-language-action model, and propose Adaptive Action Grids to represent spatial robot movement actions with adaptive discretized action grids, facilitating learning generalizable and transferrable spatial action knowledge for cross-robot control. SpatialVLA is first pre-trained on top of a vision-language model with 1.1 Million real-world robot episodes, to learn a generalist manipulation policy across multiple robot environments and tasks. After pre-training, SpatialVLA is directly applied to perform numerous tasks in a zero-shot manner. The superior results in both simulation and real-world robots demonstrate its advantage of inferring complex robot motion trajectories and its strong in-domain multi-task generalization ability. We further show the proposed Adaptive Action Grids offer a new and effective way to fine-tune the pre-trained SpatialVLA model for new simulation and real-world setups, where the pre-learned action grids are re-discretized to capture robot-specific spatial action movements of new setups. The superior results from extensive evaluations demonstrate the exceptional in-distribution generalization and out-of-distribution adaptation capability, highlighting the crucial benefit of the proposed spatial-aware representations for generalist robot policy learning. All the details and codes will be open-sourced.
Building reliable sim driving agents by scaling self-play
Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing system limits, but all applications share one key requirement: reliability. To enable sound experimentation, a simulation agent must behave as intended. It should minimize actions that may lead to undesired outcomes, such as collisions, which can distort the signal-to-noise ratio in analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents solve almost the full training set within a day. They generalize to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in such cases. We open-source the pre-trained agents and integrate them with a batched multi-agent simulator. Demonstrations of agent behaviors can be viewed at https://sites.google.com/view/reliable-sim-agents, and we open-source our agents at https://github.com/Emerge-Lab/gpudrive.
comment: v3
From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios
Ensuring the safety of autonomous vehicles requires virtual scenario-based testing, which depends on the robust evaluation and generation of safety-critical scenarios. So far, researchers have used scenario-based testing frameworks that rely heavily on handcrafted scenarios as safety metrics. To reduce the effort of human interpretation and overcome the limited scalability of these approaches, we combine Large Language Models (LLMs) with structured scenario parsing and prompt engineering to automatically evaluate and generate safety-critical driving scenarios. We introduce Cartesian and Ego-centric prompt strategies for scenario evaluation, and an adversarial generation module that modifies trajectories of risk-inducing vehicles (ego-attackers) to create critical scenarios. We validate our approach using a 2D simulation framework and multiple pre-trained LLMs. The results show that the evaluation module effectively detects collision scenarios and infers scenario safety. Meanwhile, the new generation module identifies high-risk agents and synthesizes realistic, safety-critical scenarios. We conclude that an LLM equipped with domain-informed prompting techniques can effectively evaluate and generate safety-critical driving scenarios, reducing dependence on handcrafted metrics. We release our open-source code and scenarios at: https://github.com/TUM-AVS/From-Words-to-Collisions.
Enhanced Probabilistic Collision Detection for Motion Planning Under Sensing Uncertainty
Probabilistic collision detection (PCD) is essential in motion planning for robots operating in unstructured environments, where considering sensing uncertainty helps prevent damage. Existing PCD methods mainly used simplified geometric models and addressed only position estimation errors. This paper presents an enhanced PCD method with two key advancements: (a) using superquadrics for more accurate shape approximation and (b) accounting for both position and orientation estimation errors to improve robustness under sensing uncertainty. Our method first computes an enlarged surface for each object that encapsulates its observed rotated copies, thereby addressing the orientation estimation errors. Then, the collision probability under the position estimation errors is formulated as a chance-constraint problem that is solved with a tight upper bound. Both the two steps leverage the recently developed normal parameterization of superquadric surfaces. Results show that our PCD method is twice as close to the Monte-Carlo sampled baseline as the best existing PCD method and reduces path length by 30% and planning time by 37%, respectively. A Real2Sim2Real pipeline further validates the importance of considering orientation estimation errors, showing that the collision probability of executing the planned path in simulation is only 2%, compared to 9% and 29% when considering only position estimation errors or none at all.
Learning Long-Context Diffusion Policies via Past-Token Prediction
Reasoning over long sequences of observations and actions is essential for many robotic tasks. Yet, learning effective long-context policies from demonstrations remains challenging. As context length increases, training becomes increasingly expensive due to rising memory demands, and policy performance often degrades as a result of spurious correlations. Recent methods typically sidestep these issues by truncating context length, discarding historical information that may be critical for subsequent decisions. In this paper, we propose an alternative approach that explicitly regularizes the retention of past information. We first revisit the copycat problem in imitation learning and identify an opposite challenge in recent diffusion policies: rather than over-relying on prior actions, they often fail to capture essential dependencies between past and future actions. To address this, we introduce Past-Token Prediction (PTP), an auxiliary task in which the policy learns to predict past action tokens alongside future ones. This regularization significantly improves temporal modeling in the policy head, with minimal reliance on visual representations. Building on this observation, we further introduce a multistage training strategy: pre-train the visual encoder with short contexts, and fine-tune the policy head using cached long-context embeddings. This strategy preserves the benefits of PTP while greatly reducing memory and computational overhead. Finally, we extend PTP into a self-verification mechanism at test time, enabling the policy to score and select candidates consistent with past actions during inference. Experiments across four real-world and six simulated tasks demonstrate that our proposed method improves the performance of long-context diffusion policies by 3x and accelerates policy training by more than 10x.
comment: Videos are available at https://long-context-dp.github.io
TCAFF: Temporal Consistency for Robot Frame Alignment ICRA 2025
In the field of collaborative robotics, the ability to communicate spatial information like planned trajectories and shared environment information is crucial. When no global position information is available (e.g., indoor or GPS-denied environments), agents must align their coordinate frames before shared spatial information can be properly expressed and interpreted. Coordinate frame alignment is particularly difficult when robots have no initial alignment and are affected by odometry drift. To this end, we develop a novel multiple hypothesis algorithm, called TCAFF, for aligning the coordinate frames of neighboring robots. TCAFF considers potential alignments from associating sparse open-set object maps and leverages temporal consistency to determine an initial alignment and correct for drift, all without any initial knowledge of neighboring robot poses. We demonstrate TCAFF being used for frame alignment in a collaborative object tracking application on a team of four robots tracking six pedestrians and show that TCAFF enables robots to achieve a tracking accuracy similar to that of a system with ground truth localization. The code and hardware dataset are available at https://github.com/mit-acl/tcaff.
comment: 7 pages, 6 figures, accepted to ICRA 2025
Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation ICRA 2025
Visuotactile sensors are a popular tactile sensing strategy due to high-fidelity estimates of local object geometry. However, existing algorithms for processing raw sensor inputs to useful intermediate signals such as contact patches struggle in high-deformation regimes. This is due to physical constraints imposed by sensor hardware and small-deformation assumptions used by mechanics-based models. In this work, we propose a fusion algorithm for proximity and visuotactile point clouds for contact patch segmentation, entirely independent from membrane mechanics. This algorithm exploits the synchronous, high spatial resolution proximity and visuotactile modalities enabled by an extremely deformable, selectively transmissive soft membrane, which uses visible light for visuotactile sensing and infrared light for proximity depth. We evaluate our contact patch algorithm in low (10%), medium (60%), and high (100%+) strain states. We compare our method against three baselines: proximity-only, tactile-only, and a first principles mechanics model. Our approach outperforms all baselines with an average RMSE under 2.8 mm of the contact patch geometry across all strain ranges. We demonstrate our contact patch algorithm in four applications: varied stiffness membranes, torque and shear-induced wrinkling, closed loop control, and pose estimation.
comment: Accepted to ICRA 2025
Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing ICRA 2025
Recent progress in reinforcement learning (RL) and tactile sensing has significantly advanced dexterous manipulation. However, these methods often utilize simplified tactile signals due to the gap between tactile simulation and the real world. We introduce a sensor model for tactile skin that enables zero-shot sim-to-real transfer of ternary shear and binary normal forces. Using this model, we develop an RL policy that leverages sliding contact for dexterous in-hand translation. We conduct extensive real-world experiments to assess how tactile sensing facilitates policy adaptation to various unseen object properties and robot hand orientations. We demonstrate that our 3-axis tactile policies consistently outperform baselines that use only shear forces, only normal forces, or only proprioception. Website: https://jessicayin.github.io/tactile-skin-rl/
comment: Website: https://jessicayin.github.io/tactile-skin-rl/. Accepted to ICRA 2025
Multiagent Systems
Robin: A multi-agent system for automating scientific discovery
Scientific discovery is driven by the iterative process of background research, hypothesis generation, experimentation, and data analysis. Despite recent advancements in applying artificial intelligence to scientific discovery, no system has yet automated all of these stages in a single workflow. Here, we introduce Robin, the first multi-agent system capable of fully automating the key intellectual steps of the scientific process. By integrating literature search agents with data analysis agents, Robin can generate hypotheses, propose experiments, interpret experimental results, and generate updated hypotheses, achieving a semi-autonomous approach to scientific discovery. By applying this system, we were able to identify a novel treatment for dry age-related macular degeneration (dAMD), the major cause of blindness in the developed world. Robin proposed enhancing retinal pigment epithelium phagocytosis as a therapeutic strategy, and identified and validated a promising therapeutic candidate, ripasudil. Ripasudil is a clinically-used rho kinase (ROCK) inhibitor that has never previously been proposed for treating dAMD. To elucidate the mechanism of ripasudil-induced upregulation of phagocytosis, Robin then proposed and analyzed a follow-up RNA-seq experiment, which revealed upregulation of ABCA1, a critical lipid efflux pump and possible novel target. All hypotheses, experimental plans, data analyses, and data figures in the main text of this report were produced by Robin. As the first AI system to autonomously discover and validate a novel therapeutic candidate within an iterative lab-in-the-loop framework, Robin establishes a new paradigm for AI-driven scientific discovery.
IG Parser: A Software Package for the Encoding of Institutional Statements using the Institutional Grammar
This article provides an overview of IG Parser, a software that facilitates qualitative content analysis of formal (e.g., legal) rules or informal (e.g., socio-normative) norms, and strategies (such as conventions) -- referred to as \emph{institutions} -- that govern social systems and operate configurally to describe \emph{institutional systems}. To this end, the IG Parser employs a distinctive syntax that ensures rigorous encoding of natural language, while automating the transformation into various formats that support the downstream analysis using diverse analytical techniques. The conceptual core of the IG Parser is an associated syntax, IG Script, that operationalizes the conceptual foundations of the Institutional Grammar, and more specifically Institutional Grammar 2.0, an analytical paradigm for institutional analysis. This article presents the IG Parser, including its conceptual foundations, syntactic specification of IG Script, alongside architectural principles. This introduction is augmented with selective illustrative examples that highlight the use and benefit associated with the tool.
comment: 24 pages
Synthesis of Communication Policies for Multi-Agent Systems Robust to Communication Restrictions IJCAI 2025
We study stochastic multi-agent systems in which agents must cooperate to maximize the probability of achieving a common reach-avoid objective. In many applications, during the execution of the system, the communication between the agents can be constrained by restrictions on the bandwidth currently available for exchanging local-state information between the agents. In this paper, we propose a method for computing joint action and communication policies for the group of agents that aim to satisfy the communication restrictions as much as possible while achieving the optimal reach-avoid probability when communication is unconstrained. Our method synthesizes a pair of action and communication policies robust to restrictions on the number of agents allowed to communicate. To this end, we introduce a novel cost function that measures the amount of information exchanged beyond what the communication policy allows. We evaluate our approach experimentally on a range of benchmarks and demonstrate that it is capable of computing pairs of action and communication policies that satisfy the communication restrictions, if such exist.
comment: This is the extended version of the paper accepted for publication at IJCAI 2025
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
As AI systems increasingly assume roles where trust and alignment with human values are essential, understanding when and why they engage in deception has become a critical research priority. We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games, designed to probe deception, trust formation, and strategic communication among large language model (LLM) agents under asymmetric information. A minority of agents the traitors seek to mislead the majority, while the faithful must infer hidden identities through dialogue and reasoning. Our contributions are: (1) we ground the environment in formal frameworks from game theory, behavioral economics, and social cognition; (2) we develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality; (3) we implement a fully autonomous simulation platform where LLMs reason over persistent memory and evolving social dynamics, with support for heterogeneous agent populations, specialized traits, and adaptive behaviors. Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry: advanced models like GPT-4o demonstrate superior deceptive capabilities yet exhibit disproportionate vulnerability to others' falsehoods. This suggests deception skills may scale faster than detection abilities. Overall, The Traitors provides a focused, configurable testbed for investigating LLM behavior in socially nuanced interactions. We position this work as a contribution toward more rigorous research on deception mechanisms, alignment challenges, and the broader social reliability of AI systems.
comment: 9 main pages, 31 pages
PyFCG: Fluid Construction Grammar in Python
We present PyFCG, an open source software library that ports Fluid Construction Grammar (FCG) to the Python programming language. PyFCG enables its users to seamlessly integrate FCG functionality into Python programs, and to use FCG in combination with other libraries within Python's rich ecosystem. Apart from a general description of the library, this paper provides three walkthrough tutorials that demonstrate example usage of PyFCG in typical use cases of FCG: (i) formalising and testing construction grammar analyses, (ii) learning usage-based construction grammars from corpora, and (iii) implementing agent-based experiments on emergent communication.
From Grunts to Grammar: Emergent Language from Cooperative Foraging
Early cavemen relied on gestures, vocalizations, and simple signals to coordinate, plan, avoid predators, and share resources. Today, humans collaborate using complex languages to achieve remarkable results. What drives this evolution in communication? How does language emerge, adapt, and become vital for teamwork? Understanding the origins of language remains a challenge. A leading hypothesis in linguistics and anthropology posits that language evolved to meet the ecological and social demands of early human cooperation. Language did not arise in isolation, but through shared survival goals. Inspired by this view, we investigate the emergence of language in multi-agent Foraging Games. These environments are designed to reflect the cognitive and ecological constraints believed to have influenced the evolution of communication. Agents operate in a shared grid world with only partial knowledge about other agents and the environment, and must coordinate to complete games like picking up high-value targets or executing temporally ordered actions. Using end-to-end deep reinforcement learning, agents learn both actions and communication strategies from scratch. We find that agents develop communication protocols with hallmark features of natural language: arbitrariness, interchangeability, displacement, cultural transmission, and compositionality. We quantify each property and analyze how different factors, such as population size and temporal dependencies, shape specific aspects of the emergent language. Our framework serves as a platform for studying how language can evolve from partial observability, temporal reasoning, and cooperative goals in embodied multi-agent settings. We will release all data, code, and models publicly.
Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning AAMAS 2025
Multi-agent reinforcement Learning (MARL) is often challenged by the sight range dilemma, where agents either receive insufficient or excessive information from their environment. In this paper, we propose a novel method, called Dynamic Sight Range Selection (DSR), to address this issue. DSR utilizes an Upper Confidence Bound (UCB) algorithm and dynamically adjusts the sight range during training. Experiment results show several advantages of using DSR. First, we demonstrate using DSR achieves better performance in three common MARL environments, including Level-Based Foraging (LBF), Multi-Robot Warehouse (RWARE), and StarCraft Multi-Agent Challenge (SMAC). Second, our results show that DSR consistently improves performance across multiple MARL algorithms, including QMIX and MAPPO. Third, DSR offers suitable sight ranges for different training steps, thereby accelerating the training process. Finally, DSR provides additional interpretability by indicating the optimal sight range used during training. Unlike existing methods that rely on global information or communication mechanisms, our approach operates solely based on the individual sight ranges of agents. This approach offers a practical and efficient solution to the sight range dilemma, making it broadly applicable to real-world complex environments.
comment: Accepted at AAMAS 2025. The compiled PDF includes the appendix
Adaptive Inference through Bayesian and Inverse Bayesian Inference with Symmetry-Bias in Nonstationary Environments
This study introduces a novel inference framework, designated as Bayesian and inverse Bayesian (BIB) inference, which concurrently performs both conventional and inverse Bayesian updates by integrating symmetry bias into Bayesian inference. The effectiveness of the model was evaluated through a sequential estimation task involving observations sampled from a Gaussian distribution with a stochastically time-varying mean. Conventional Bayesian inference entails a fundamental trade-off between adaptability to abrupt environmental shifts and estimation accuracy during stable intervals. The BIB framework addresses this limitation by dynamically modulating the learning rate through inverse Bayesian updates, thereby enhancing adaptive flexibility. The BIB model generated spontaneous bursts in the learning rate during sudden environmental transitions, transiently entering a high-sensitivity state to accommodate incoming data. This intermittent burst-relaxation pattern functions as a dynamic mechanism that balances adaptability and accuracy. Further analysis of burst interval distributions demonstrated that the BIB model consistently produced power-law distributions under diverse conditions. Such robust scaling behavior, absent in conventional Bayesian inference, appears to emerge from a self-regulatory mechanism driven by inverse Bayesian updates. These results present a novel computational perspective on scale-free phenomena in natural systems and offer implications for designing adaptive inference systems in nonstationary environments.
PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI
Advances in deep generative modelling have made it increasingly plausible to train human-level embodied agents. Yet progress has been limited by the absence of large-scale, real-time, multi-modal, and socially interactive datasets that reflect the sensory-motor complexity of natural environments. To address this, we present PLAICraft, a novel data collection platform and dataset capturing multiplayer Minecraft interactions across five time-aligned modalities: video, game output audio, microphone input audio, mouse, and keyboard actions. Each modality is logged with millisecond time precision, enabling the study of synchronous, embodied behaviour in a rich, open-ended world. The dataset comprises over 10,000 hours of gameplay from more than 10,000 global participants.\footnote{We have done a privacy review for the public release of an initial 200-hour subset of the dataset, with plans to release most of the dataset over time.} Alongside the dataset, we provide an evaluation suite for benchmarking model capabilities in object recognition, spatial awareness, language grounding, and long-term memory. PLAICraft opens a path toward training and evaluating agents that act fluently and purposefully in real time, paving the way for truly embodied artificial intelligence.
comment: 9 pages, 8 figures
Lightweight and Effective Preference Construction in PIBT for Large-Scale Multi-Agent Pathfinding
PIBT is a computationally lightweight algorithm that can be applied to a variety of multi-agent pathfinding (MAPF) problems, generating the next collision-free locations of agents given another. Because of its simplicity and scalability, it is becoming a popular underlying scheme for recent large-scale MAPF methods involving several hundreds or thousands of agents. Vanilla PIBT makes agents behave greedily towards their assigned goals, while agents typically have multiple best actions, since the graph shortest path is not always unique. Consequently, tiebreaking about how to choose between these actions significantly affects resulting solutions. This paper studies two simple yet effective techniques for tiebreaking in PIBT, without compromising its computational advantage. The first technique allows an agent to intelligently dodge another, taking into account whether each action will hinder the progress of the next timestep. The second technique is to learn, through multiple PIBT runs, how an action causes regret in others and to use this information to minimise regret collectively. Our empirical results demonstrate that these techniques can reduce the solution cost of one-shot MAPF and improve the throughput of lifelong MAPF. For instance, in densely populated one-shot cases, the combined use of these tiebreaks achieves improvements of around 10-20% in sum-of-costs, without significantly compromising the speed of a PIBT-based planner.
comment: To be presented at SoCS-25
The Hamiltonian of Poly-matrix Zero-sum Games
Understanding a dynamical system fundamentally relies on establishing an appropriate Hamiltonian function and elucidating its symmetries. By formulating agents' strategies and cumulative payoffs as canonically conjugate variables, we identify the Hamiltonian function that generates the dynamics of poly-matrix zero-sum games. We reveal the symmetries of our Hamiltonian and derive the associated conserved quantities, showing how the conservation of probability and the invariance of the Fenchel coupling are intrinsically encoded within the system. Furthermore, we propose the dissipation FTRL (DFTRL) dynamics by introducing a perturbation that dissipates the Fenchel coupling, proving convergence to the Nash equilibrium and linking DFTRL to last-iterate convergent algorithms. Our results highlight the potential of Hamiltonian dynamics in uncovering the structural properties of learning dynamics in games, and pave the way for broader applications of Hamiltonian dynamics in game theory and machine learning.
comment: 26 pages, 4 figures
Model Cards for AI Teammates: Comparing Human-AI Team Familiarization Methods for High-Stakes Environments
We compare three methods of familiarizing a human with an artificial intelligence (AI) teammate ("agent") prior to operation in a collaborative, fast-paced intelligence, surveillance, and reconnaissance (ISR) environment. In a between-subjects user study (n=60), participants either read documentation about the agent, trained alongside the agent prior to the mission, or were given no familiarization. Results showed that the most valuable information about the agent included details of its decision-making algorithms and its relative strengths and weaknesses compared to the human. This information allowed the familiarization groups to form sophisticated team strategies more quickly than the control group. Documentation-based familiarization led to the fastest adoption of these strategies, but also biased participants towards risk-averse behavior that prevented high scores. Participants familiarized through direct interaction were able to infer much of the same information through observation, and were more willing to take risks and experiment with different control modes, but reported weaker understanding of the agent's internal processes. Significant differences were seen between individual participants' risk tolerance and methods of AI interaction, which should be considered when designing human-AI control interfaces. Based on our findings, we recommend a human-AI team familiarization method that combines AI documentation, structured in-situ training, and exploratory interaction.
comment: Submitted to IEEE RO-MAN 2025 (under review). 8 pages, 7 figures
Non-Obvious Manipulability in Additively Separable and Fractional Hedonic Games IJCAI'25
In this work, we consider the design of Non-Obviously Manipulable (NOM) mechanisms, mechanisms that bounded rational agents may fail to recognize as manipulable, for two relevant classes of succinctly representable Hedonic Games: Additively Separable and Fractional Hedonic Games. In these classes, agents have cardinal scores towards other agents, and their preferences over coalitions are determined by aggregating such scores. This aggregation results in a utility function for each agent, which enables the evaluation of outcomes via the utilitarian social welfare. We first prove that, when scores can be arbitrary, every optimal mechanism is NOM; moreover, when scores are limited in a continuous interval, there exists an optimal mechanism that is NOM. Given the hardness of computing optimal outcomes in these settings, we turn our attention to efficient and NOM mechanisms. To this aim, we first prove a characterization of NOM mechanisms that simplifies the class of mechanisms of interest. Then, we design a NOM mechanism returning approximations that asymptotically match the best-known approximation achievable in polynomial time. Finally, we focus on discrete scores, where the compatibility of NOM with optimality depends on the specific values. Therefore, we initiate a systematic analysis to identify which discrete values support this compatibility and which do not.
comment: Accepted paper at IJCAI'25
Prompt Stability Matters: Evaluating and Optimizing Auto-Generated Prompt in General-Purpose Systems
Automatic prompt generation plays a crucial role in enabling general-purpose multi-agent systems to perform diverse tasks autonomously. Existing methods typically evaluate prompts based on their immediate task performance, overlooking the intrinsic qualities that determine their reliability. This outcome-centric view not only limits interpretability but also fails to account for the inherent stochasticity of large language models (LLMs). In this work, we bring attention to prompt stability-the consistency of model responses across repeated executions-as a key factor for building robust and effective prompt generation systems. To quantify this, we propose semantic stability as a criterion for assessing the response consistency of prompts, and fine-tune a LLaMA-based evaluator to measure it automatically across tasks. These components have enabled us to develop the first stability-aware general-purpose prompt generation system that leverages stability feedback to iteratively enhance both prompt quality and system-level performance. Furthermore, we establish a logical chain between prompt stability and task success by analyzing the structural dependencies within our system, proving stability as a necessary condition for effective system-level execution. Empirical results across general and domain-specific tasks demonstrate that our stability-aware framework improves both accuracy and output consistency. By shifting the focus from one-off results to persistent reliability, our work offers a new perspective on prompt design and contributes practical tools for building more trustworthy general-purpose systems.
Origin-Destination Pattern Effects on Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning
Traffic congestion remains a major challenge for modern urban transportation, diminishing both efficiency and quality of life. While autonomous driving technologies and reinforcement learning (RL) have shown promise for improving traffic control, most prior work has focused on small-scale networks or isolated intersections. Large-scale mixed traffic control, involving both human-driven and robotic vehicles, remains underexplored. In this study, we propose a decentralized multi-agent reinforcement learning framework for managing large-scale mixed traffic networks, where intersections are controlled either by traditional traffic signals or by robotic vehicles. We evaluate our approach on a real-world network of 14 intersections in Colorado Springs, Colorado, USA, using average vehicle waiting time as the primary measure of traffic efficiency. Results demonstrate that strategically adjusting major origin-destination (OD) flow patterns can effectively reduce congestion, offering a new pathway for enhancing urban mobility.
Beyond Single Pass, Looping Through Time: KG-IRAG with Iterative Knowledge Retrieval
Graph Retrieval-Augmented Generation (GraphRAG) has proven highly effective in enhancing the performance of Large Language Models (LLMs) on tasks that require external knowledge. By leveraging Knowledge Graphs (KGs), GraphRAG improves information retrieval for complex reasoning tasks, providing more precise and comprehensive retrieval and generating more accurate responses to QAs. However, most RAG methods fall short in addressing multi-step reasoning, particularly when both information extraction and inference are necessary. To address this limitation, this paper presents Knowledge Graph-Based Iterative Retrieval-Augmented Generation (KG-IRAG), a novel framework that integrates KGs with iterative reasoning to improve LLMs' ability to handle queries involving temporal and logical dependencies. Through iterative retrieval steps, KG-IRAG incrementally gathers relevant data from external KGs, enabling step-by-step reasoning. The proposed approach is particularly suited for scenarios where reasoning is required alongside dynamic temporal data extraction, such as determining optimal travel times based on weather conditions or traffic patterns. Experimental results show that KG-IRAG improves accuracy in complex reasoning tasks by effectively integrating external knowledge with iterative, logic-based retrieval. Additionally, three new datasets: weatherQA-Irish, weatherQA-Sydney, and trafficQA-TFNSW, are formed to evaluate KG-IRAG's performance, demonstrating its potential beyond traditional RAG applications.
comment: 15 pages, 3 figures
Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework
The integration of experimental technologies with large language models (LLMs) is transforming scientific research. It positions AI as a versatile research assistant rather than a mere problem-solving tool. In the field of power systems, however, managing simulations -- one of the essential experimental technologies -- remains a challenge for LLMs due to their limited domain-specific knowledge, restricted reasoning capabilities, and imprecise handling of simulation parameters. To address these limitations, this paper proposes a feedback-driven, multi-agent framework. It incorporates three proposed modules: an enhanced retrieval-augmented generation (RAG) module, an improved reasoning module, and a dynamic environmental acting module with an error-feedback mechanism. Validated on 69 diverse tasks from Daline and MATPOWER, this framework achieves success rates of 93.13% and 96.85%, respectively. It significantly outperforms ChatGPT 4o, o1-preview, and the fine-tuned GPT-4o, which all achieved a success rate lower than 30% on complex tasks. Additionally, the proposed framework also supports rapid, cost-effective task execution, completing each simulation in approximately 30 seconds at an average cost of 0.014 USD for tokens. Overall, this adaptable framework lays a foundation for developing intelligent LLM-based assistants for human researchers, facilitating power system research and beyond.
comment: 16 pages
The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent Systems
Consensus formation is pivotal in multi-agent systems (MAS), balancing collective coherence with individual diversity. Conventional LLM-based MAS primarily rely on explicit coordination, e.g., prompts or voting, risking premature homogenization. We argue that implicit consensus, where agents exchange information yet independently form decisions via in-context learning, can be more effective in dynamic environments that require long-horizon adaptability. By retaining partial diversity, systems can better explore novel strategies and cope with external shocks. We formalize a consensus-diversity tradeoff, showing conditions where implicit methods outperform explicit ones. Experiments on three scenarios -- Dynamic Disaster Response, Information Spread and Manipulation, and Dynamic Public-Goods Provision -- confirm partial deviation from group norms boosts exploration, robustness, and performance. We highlight emergent coordination via in-context learning, underscoring the value of preserving diversity for resilient decision-making.
comment: Source codes are available at https://github.com/wuzengqing001225/ConsensusDiversityTradeoffMAS
Causal Graph Dynamics and Kan Extensions
On the one side, the formalism of Global Transformations comes with the claim of capturing any transformation of space that is local, synchronous and deterministic. The claim has been proven for different classes of models such as mesh refinements from computer graphics, Lindenmayer systems from morphogenesis modeling and cellular automata from biological, physical and parallel computation modeling. The Global Transformation formalism achieves this by using category theory for its genericity, and more precisely the notion of Kan extension to determine the global behaviors based on the local ones. On the other side, Causal Graph Dynamics describe the transformation of port graphs in a synchronous and deterministic way and has not yet being tackled. In this paper, we show the precise sense in which the claim of Global Transformations holds for them as well. This is done by showing different ways in which they can be expressed as Kan extensions, each of them highlighting different features of Causal Graph Dynamics. Along the way, this work uncovers the interesting class of Monotonic Causal Graph Dynamics and their universality among General Causal Graph Dynamics.
MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework
Simulating collective decision-making involves more than aggregating individual behaviors; it emerges from dynamic interactions among individuals. While large language models (LLMs) offer strong potential for social simulation, achieving quantitative alignment with real-world data remains a key challenge. To bridge this gap, we propose the Mean-Field LLM (MF-LLM) framework, the first to incorporate mean field theory into LLM-based social simulation. MF-LLM models bidirectional interactions between individuals and the population through an iterative process, generating population signals to guide individual decisions, which in turn update the signals. This interplay produces coherent trajectories of collective behavior. To improve alignment with real-world data, we introduce IB-Tune, a novel fine-tuning method inspired by the Information Bottleneck principle, which retains population signals most predictive of future actions while filtering redundant history. Evaluated on a real-world social dataset, MF-LLM reduces KL divergence to human population distributions by 47\% compared to non-mean-field baselines, enabling accurate trend forecasting and effective intervention planning. Generalizing across 7 domains and 4 LLM backbones, MF-LLM provides a scalable, high-fidelity foundation for social simulation.
comment: 29 pages, 8 figures, 4 tables
Asynchronous Credit Assignment for Multi-Agent Reinforcement Learning
Credit assignment is a critical problem in multi-agent reinforcement learning (MARL), aiming to identify agents' marginal contributions for optimizing cooperative policies. Current credit assignment methods typically assume synchronous decision-making among agents. However, many real-world scenarios require agents to act asynchronously without waiting for others. This asynchrony introduces conditional dependencies between actions, which pose great challenges to current methods. To address this issue, we propose an asynchronous credit assignment framework, incorporating a Virtual Synchrony Proxy (VSP) mechanism and a Multiplicative Value Decomposition (MVD) algorithm. VSP enables physically asynchronous actions to be virtually synchronized during credit assignment. We theoretically prove that VSP preserves both task equilibrium and algorithm convergence. Furthermore, MVD leverages multiplicative interactions to effectively model dependencies among asynchronous actions, offering theoretical advantages in handling asynchronous tasks. Extensive experiments show that our framework consistently outperforms state-of-the-art MARL methods on challenging tasks while providing improved interpretability for asynchronous cooperation.
Revisiting Communication Efficiency in Multi-Agent Reinforcement Learning from the Dimensional Analysis Perspective
In this work, we introduce a novel perspective, i.e., dimensional analysis, to address the challenge of communication efficiency in Multi-Agent Reinforcement Learning (MARL). Our findings reveal that simply optimizing the content and timing of communication at sending end is insufficient to fully resolve communication efficiency issues. Even after applying optimized and gated messages, dimensional redundancy and confounders still persist in the integrated message embeddings at receiving end, which negatively impact communication quality and decision-making. To address these challenges, we propose Dimensional Rational Multi-Agent Communication (DRMAC), designed to mitigate both dimensional redundancy and confounders in MARL. DRMAC incorporates a redundancy-reduction regularization term to encourage the decoupling of information across dimensions within the learned representations of integrated messages. Additionally, we introduce a dimensional mask that dynamically adjusts gradient weights during training to eliminate the influence of decision-irrelevant dimensions. We evaluate DRMAC across a diverse set of multi-agent tasks, demonstrating its superior performance over existing state-of-the-art methods in complex scenarios. Furthermore, the plug-and-play nature of DRMAC's key modules highlights its generalizable performance, serving as a valuable complement rather than a replacement for existing multi-agent communication strategies.
Systems and Control (CS)
An Empirical Bayes approach to ARX Estimation
Empirical Bayes inference is based on estimation of the parameters of an a priori distribution from the observed data. The estimation technique of the parameters of the prior, called hyperparameters, is based on the marginal distribution obtained by integrating the joint density of the model with respect to the prior. This is a key step which needs to be properly adapted to the problem at hand. In this paper we study Empirical Bayes inference of linear autoregressive models with inputs (ARX models) for time series and compare the performance of the marginal parametric estimator with that a full Empirical Bayesian analysis based on the estimated prior. Such a comparison, can only make sense for a (realistic) finite data length. In this setting, we propose a new estimation technique of the hyperparameters by a sequential Bayes procedure which is essentially a backward Kalman filter. It turns out that for finite data length the marginal Bayes tends to behave slightly better than the full Empirical Bayesian parameter estimator and so also in the case of slowly varying random parameters.
Quantum Hardware-in-the-Loop for Optimal Power Flow in Renewable-Integrated Power Systems
This paper presents a proof-of-concept for integrating quantum hardware with real-time digital simulator (RTDS) to model and control modern power systems, including renewable energy resources. Power flow (PF) analysis and optimal power flow (OPF) studies are conducted using RTDS coupled with Fujitsu's CMOS Digital Annealer and D-Wave's Advantage quantum processors. The adiabatic quantum power flow (AQPF) and adiabatic quantum optimal power flow (AQOPF) algorithms are used to perform PF and OPF, respectively, on quantum and quantum-inspired hardware. The experiments are performed on the IEEE 9-bus test system and a modified version that includes solar and wind farms. The findings demonstrate that the AQPF and AQOPF algorithms can accurately perform PF and OPF, yielding results that closely match those of classical Newton-Raphson (NR) solvers while also exhibiting robust convergence. Furthermore, the integration of renewable energy sources (RES) within the AQOPF framework proves effective in maintaining system stability and performance, even under variable generation conditions. These findings highlight the potential of quantum computing to significantly enhance the modeling and control of future power grids, particularly in systems with high renewable energy penetration.
Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems
We investigate multitask edge-user communication-computation resource allocation for $360^\circ$ video streaming in an edge-computing enabled millimeter wave (mmWave) multi-user virtual reality system. To balance the communication-computation trade-offs that arise herein, we formulate a video quality maximization problem that integrates interdependent multitask/multi-user action spaces and rebuffering time/quality variation constraints. We formulate a deep reinforcement learning framework for \underline{m}ulti-\underline{t}ask \underline{r}ate adaptation and \underline{c}omputation distribution (MTRC) to solve the problem of interest. Our solution does not rely on a priori knowledge about the environment and uses only prior video streaming statistics (e.g., throughput, decoding time, and transmission delay), and content information, to adjust the assigned video bitrates and computation distribution, as it observes the induced streaming performance online. Moreover, to capture the task interdependence in the environment, we leverage neural network cascades to extend our MTRC method to two novel variants denoted as R1C2 and C1R2. We train all three methods with real-world mmWave network traces and $360^\circ$ video datasets to evaluate their performance in terms of expected quality of experience (QoE), viewport peak signal-to-noise ratio (PSNR), rebuffering time, and quality variation. We outperform state-of-the-art rate adaptation algorithms, with C1R2 showing best results and achieving $5.21-6.06$ dB PSNR gains, $2.18-2.70$x rebuffering time reduction, and $4.14-4.50$ dB quality variation reduction.
comment: Accepted to be published in IEEE Transaction on Multimedia
Learning Driven Elastic Task Multi-Connectivity Immersive Computing Systems
In virtual reality (VR) environments, computational tasks exhibit an elastic nature, meaning they can dynamically adjust based on various user and system constraints. This elasticity is essential for maintaining immersive experiences; however, it also introduces challenges for communication and computing in VR systems. In this paper, we investigate elastic task offloading for multi-user edge-computing-enabled VR systems with multi-connectivity, aiming to maximize the computational energy-efficiency (computational throughput per unit of energy consumed). To balance the induced communication, computation, energy consumption, and quality of experience trade-offs due to the elasticity of VR tasks, we formulate a constrained stochastic computational energy-efficiency optimization problem that integrates the multi-connectivity/multi-user action space and the elastic nature of VR computational tasks. We formulate a centralized phasic policy gradient (CPPG) framework to solve the problem of interest online, using only prior elastic task offloading statistics (energy consumption, response time, and transmission time), and task information (i.e., task size and computational intensity), while observing the induced system performance (energy consumption and latency). We further extend our approach to decentralized learning by formulating an independent phasic policy gradient (IPPG) method and a decentralized shared multi-armed bandit (DSMAB) method. We train our methods with real-world 4G, 5G, and WiGig network traces and 360 video datasets to evaluate their performance in terms of response time, energy efficiency, scalability, and delivered quality of experience. We also provide a comprehensive analysis of task size and its effect on offloading policy and system performance. In particular, we show that CPPG reduces latency by 28% and energy consumption by 78% compared to IPPG.
comment: Under review by IEEE Transaction on Mobile Computing
Frequency-Dependent Power Consumption Modeling of CMOS Transmitters for WNoC Architectures
Wireless Network-on-Chip (WNoC) systems, which wirelessly interconnect the chips of a computing system, have been proposed as a complement to existing chip-to-chip wired links. However, their feasibility depends on the availability of custom-designed high-speed, tiny, ultra-efficient transceivers. This represents a challenge due to the tradeoffs between bandwidth, area, and energy efficiency that are found as frequency increases, which suggests that there is an optimal frequency region. To aid in the search for such an optimal design point, this paper presents a behavioral model that quantifies the expected power consumption of oscillators, mixers, and power amplifiers as a function of frequency. The model is built on extensive surveys of the respective sub-blocks, all based on experimental data. By putting together the models of the three sub-blocks, a comprehensive power model is obtained, which will aid in selecting the optimal operating frequency for WNoC systems.
Output behavior equivalence and simultaneous subspace identification of systems and faults
We address the problem of identifying a system subject to additive faults, while simultaneously reconstructing the fault signal via subspace methods. We do not require nominal data for the identification, neither do we impose any assumption on the class of faults, e.g., sensor or actuator faults. We show that, under mild assumptions on the fault signal, standard PI-MOESP can recover the system matrices associated to the input-output subsystem. Then we introduce the concept of output behavior equivalence, which characterizes systems with the same output behavior set, and present a method to establish this equivalence from system matrices. Finally, we show how to estimate from data the complete set of fault matrices for which there exist a fault signal with minimal dimension that explains the data.
comment: 7 pages, 2 figures. Under review. To reproduce the results of this preprint, see https://github.com/ggleizer/fault_id
Low-regret Strategies for Energy Systems Planning in a Highly Uncertain Future
Large uncertainties in the energy transition urge decision-makers to develop low-regret strategies, i.e., strategies that perform well regardless of how the future unfolds. To address this challenge, we introduce a decision-support framework that identifies low-regret strategies in energy system planning under uncertainty. Our framework (i) automatically identifies strategies, (ii) evaluates their performance in terms of regret, (iii) assesses the key drivers of regret, and (iv) supports the decision process with intuitive decision trees, regret curves and decision maps. We apply the framework to evaluate the optimal use of biomass in the transition to net-zero energy systems, considering all major biomass utilization options: biofuels, biomethane, chemicals, hydrogen, biochar, electricity, and heat. Producing fuels and chemicals from biomass performs best across various decision-making criteria. In contrast, the current use of biomass, mainly for low-temperature heat supply, results in high regret, making it a must-avoid in the energy transition.
comment: Main paper: 22 pages, 5 figures; Supplementary Information: 70 pages, 100 figures
Constraint-Aware Diffusion Guidance for Robotics: Real-Time Obstacle Avoidance for Autonomous Racing
Diffusion models hold great potential in robotics due to their ability to capture complex, high-dimensional data distributions. However, their lack of constraint-awareness limits their deployment in safety-critical applications. We propose Constraint-Aware Diffusion Guidance (CoDiG), a data-efficient and general-purpose framework that integrates barrier functions into the denoising process, guiding diffusion sampling toward constraint-satisfying outputs. CoDiG enables constraint satisfaction even with limited training data and generalizes across tasks. We evaluate our framework in the challenging setting of miniature autonomous racing, where real-time obstacle avoidance is essential. Real-world experiments show that CoDiG generates safe outputs efficiently under dynamic conditions, highlighting its potential for broader robotic applications. A demonstration video is available at https://youtu.be/KNYsTdtdxOU.
RSS-Based Localization: Ensuring Consistency and Asymptotic Efficiency
We study the problem of signal source localization using received signal strength measurements. We begin by presenting verifiable geometric conditions for sensor deployment that ensure the model's asymptotic localizability. Then we establish the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, computing the ML estimator is challenging due to its reliance on solving a non-convex optimization problem. To overcome this, we propose a two-step estimator that retains the same asymptotic properties as the ML estimator while offering low computational complexity, linear in the number of measurements. The main challenge lies in obtaining a consistent estimator in the first step. To address this, we construct two linear least-squares estimation problems by applying algebraic transformations to the nonlinear measurement model, leading to closed-form solutions. In the second step, we perform a single Gauss-Newton iteration using the consistent estimator from the first step as the initialization, achieving the same asymptotic efficiency as the ML estimator. Finally, simulation results validate the theoretical property and practical effectiveness of the proposed two-step estimator.
Disentangling Coordiante Frames for Task Specific Motion Retargeting in Teleoperation using Shared Control and VR Controllers
Task performance in terms of task completion time in teleoperation is still far behind compared to humans conducting tasks directly. One large identified impact on this is the human capability to perform transformations and alignments, which is directly influenced by the point of view and the motion retargeting strategy. In modern teleoperation systems, motion retargeting is usually implemented through a one time calibration or switching modes. Complex tasks, like concatenated screwing, might be difficult, because the operator has to align (e.g. mirror) rotational and translational input commands. Recent research has shown, that the separation of translation and rotation leads to increased task performance. This work proposes a formal motion retargeting method, which separates translational and rotational input commands. This method is then included in a optimal control based trajectory planner and shown to work on a UR5e manipulator.
comment: 8 pages, 4 figures, conference
Regularized Model Predictive Control
In model predictive control (MPC), the choice of cost-weighting matrices and designing the Hessian matrix directly affects the trade-off between rapid state regulation and minimizing the control effort. However, traditional MPC in quadratic programming relies on fixed design matrices across the entire horizon, which can lead to suboptimal performance. This letter presents a Riccati equation-based method for adjusting the design matrix within the MPC framework, which enhances real-time performance. We employ a penalized least-squares (PLS) approach to derive a quadratic cost function for a discrete-time linear system over a finite prediction horizon. Using the method of weighting and enforcing the constraint equation by introducing a large penalty parameter, we solve the constrained optimization problem and generate control inputs for forward-shifted horizons. This process yields a recursive PLS-based Riccati equation that updates the design matrix as a regularization term in each shift, forming the foundation of the regularized MPC (Re-MPC) algorithm. To accomplish this, we provide a convergence and stability analysis of the developed algorithm. Numerical analysis demonstrates its superiority over traditional methods by allowing Riccati equation-based adjustments.
6G-Enabled Smart Railways
Smart railways integrate advanced information technologies into railway operating systems to improve efficiency and reliability. Although the development of 5G has enhanced railway services, future smart railways require ultra-high speeds, ultra-low latency, ultra-high security, full coverage, and ultra-high positioning accuracy, which 5G cannot fully meet. Therefore, 6G is envisioned to provide green and efficient all-day operations, strong information security, fully automatic driving, and low-cost intelligent maintenance. To achieve these requirements, we propose an integrated network architecture leveraging communications, computing, edge intelligence, and caching in railway systems. We have conducted in-depth investigations on key enabling technologies for reliable transmissions and wireless coverage. For high-speed mobile scenarios, we propose an AI-enabled cross-domain channel modeling and orthogonal time-frequency space-time spread multiple access mechanism to alleviate the conflict between limited spectrum availability and massive user access. The roles of blockchain, edge intelligence, and privacy technologies in endogenously secure rail communications are also evaluated. We further explore the application of emerging paradigms such as integrated sensing and communications, AI-assisted Internet of Things, semantic communications, and digital twin networks for railway maintenance, monitoring, prediction, and accident warning. Finally, possible future research and development directions are discussed.
Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach
The pursuit of rate maximization in wireless communication frequently encounters substantial challenges associated with user fairness. This paper addresses these challenges by exploring a novel power allocation approach for delay optimization, utilizing graph neural networks (GNNs)-based reinforcement learning (RL) in device-to-device (D2D) communication. The proposed approach incorporates not only channel state information but also factors such as packet delay, the number of backlogged packets, and the number of transmitted packets into the components of the state information. We adopt a centralized RL method, where a central controller collects and processes the state information. The central controller functions as an agent trained using the proximal policy optimization (PPO) algorithm. To better utilize topology information in the communication network and enhance the generalization of the proposed method, we embed GNN layers into both the actor and critic networks of the PPO algorithm. This integration allows for efficient parameter updates of GNNs and enables the state information to be parameterized as a low-dimensional embedding, which is leveraged by the agent to optimize power allocation strategies. Simulation results demonstrate that the proposed method effectively reduces average delay while ensuring user fairness, outperforms baseline methods, and exhibits scalability and generalization capability.
Scheduling of Flexible Manufacturing Systems Based on Place-Timed Petri Nets and Basis Reachability Graphs
Scheduling is a key decision-making process to improve the performance of flexible manufacturing systems. Place-timed Petri nets provide a formal method for graphically modeling and analyzing such systems. By generating reachability graphs and combining intelligent search algorithms, operation sequences from the initial state to the target state can be found for the underlying system. However, the reachability graph grows exponentially with the system size increases, which is the main challenge of existing methods for scheduling large systems. To this end, we develop an efficient improved beam search algorithm to optimize the makespan based on a compact representation of reachability graph called basis reachability graph. The key idea behind the proposed method is to form a state together with the basis markings and its corresponding transition sequences, and evaluate the cost of the state based on the resource idle time. Experimental results are conducted on several benchmark systems which show that the developed method improves the search efficiency while ensuring the quality of the solution compared with existing methods.
2T1R Regulated Memristor Conductance Control Array Architecture for Neuromorphic Computing using 28nm CMOS Technology
Memristors are promising devices for scalable and low power, in-memory computing to improve the energy efficiency of a rising computational demand. The crossbar array architecture with memristors is used for vector matrix multiplication (VMM) and acts as kernels in neuromorphic computing. The analog conductance control in a memristor is achieved by applying voltage or current through it. A basic 1T1R array is suitable to avoid sneak path issues but suffer from wire resistances, which affects the read and write procedures. A conductance control scheme with a regulated voltage source will improve the architecture and reduce the possible potential divider effects. A change in conductance is also possible with the provision of a regulated current source and measuring the voltage across the memristors. A regulated 2T1R memristor conductance control architecture is proposed in this work, which avoids the potential divider effect and virtual ground scenario in a regular crossbar scheme, as well as conductance control by passing a regulated current through memristors. The sneak path current is not allowed to pass by the provision of ground potential to both terminals of memristors.
Connecting the Equinoctial Elements and Rodrigues Parameters: A New Set of Elements
A geometric interpretation of the equinoctial elements is given with a connection to orthogonal rotations and attitude dynamics in Euclidean 3-space. An identification is made between the equinoctial elements and classic Rodrigues parameters. A new set of equinoctial elements are developed using the modified Rodrigues parameters, thereby removing the coordinate singularity for retrograde equatorial orbits present in previous versions of these elements. A low-thrust trajectory optimization problem is set up using the new elements to numerically verify convergence for the two-point boundary problem, as compared to their predecessors.
Secrecy Capacity of Hybrid VLC-RF Systems with Light Energy Harvesting
This paper studies the performance of physical layer security (PLS) in a multi-user hybrid heterogeneous visible light communication (VLC) and radio frequency (RF) wireless communication system with simultaneous lightwave information and power transfer (SLIPT). In the considered system, VLC is used for downlink (DL) while RF is employed for uplink (UL) transmission. In addition, to support multiple users, time division multiple access (TDMA) is assumed for both DL and UL channels. In the DL, each user receives information during its allocated time slot of the TDMA frame and harvests energy from the received signal outside the time slot. The harvested energy is then used for transmitting the signal over the UL channel, which is subject to eavesdropping by an unauthorized user. Therefore, PLS is employed to protect the confidentiality of the UL information. Then, an optimization problem is formulated to solve the optimal DL and UL time slots that maximize the PLS performance given a target sum rate of the DL. We show that the problem can be cast as a difference of convex functions (DC) program, which can be solved efficiently using the DC algorithm (DCA).
A Control Oriented Fractional-Order Model of Lithium-ion Batteries Based on Caputo Definition
This letter proposes a fractional-order battery model based on the Caputo definition. A closed-form time-domain solution is derived, enabling a simple recursive expression for discrete-time implementation. The model requires only the current and previous time-step states in each iteration, significantly reducing memory usage compared to the conventional Gr\"{u}nwald--Letnikov (G-L) method. This recursive structure is highly compatible with filter design and online parameter identification. Experimental validation on a 40.2~Ah NCM622 cell shows that the proposed first-order model achieves voltage prediction accuracy comparable to a second-order integer-order model. The results demonstrate that the Caputo-based model offers a practical balance between accuracy and computational efficiency, making it well suited for real-time battery management systems (BMS).
Transmission Neural Networks: Approximation and Optimal Control
Transmission Neural Networks (TransNNs) introduced by Gao and Caines (2022) connect virus spread models over networks and neural networks with tuneable activation functions. This paper presents the approximation technique and the underlying assumptions employed by TransNNs in relation to the corresponding Markovian Susceptible-Infected-Susceptible (SIS) model with 2^n states, where n is the number of nodes in the network. The underlying infection paths are assumed to be stochastic with heterogeneous and time-varying transmission probabilities. We obtain the conditional probability of infection in the stochastic 2^n-state SIS epidemic model corresponding to each state configuration under mild assumptions, which enables control solutions based on Markov decision processes (MDP). Finally, MDP control with 2^n-state SIS epidemic models and optimal control with TransNNs are compared in terms of mitigating virus spread over networks through vaccination, and it is shown that TranNNs enable the generation of control laws with significant computational savings, albeit with more conservative control actions.
MSCEKF-MIO: Magnetic-Inertial Odometry Based on Multi-State Constraint Extended Kalman Filter
To overcome the limitation of existing indoor odometry technologies which often cannot simultaneously meet requirements for accuracy cost-effectiveness, and robustness-this paper proposes a novel magnetometer array-aided inertial odometry approach, MSCEKF-MIO (Multi-State Constraint Extended Kalman Filter-based Magnetic-Inertial Odometry). We construct a magnetic field model by fitting measurements from the magnetometer array and then use temporal variations in this model-extracted from continuous observations-to estimate the carrier's absolute velocity. Furthermore, we implement the MSCEKF framework to fuse observed magnetic field variations with position and attitude estimates from inertial navigation system (INS) integration, thereby enabling autonomous, high-precision indoor relative positioning. Experimental results demonstrate that the proposed algorithm achieves superior velocity estimation accuracy and horizontal positioning precision relative to state-of-the-art magnetic array-aided INS algorithms (MAINS). On datasets with trajectory lengths of 150-250m, the proposed method yields an average horizontal position RMSE of approximately 2.5m. In areas with distinctive magnetic features, the magneto-inertial odometry achieves a velocity estimation accuracy of 0.07m/s. Consequently, the proposed method offers a novel positioning solution characterized by low power consumption, cost-effectiveness, and high reliability in complex indoor environments.
comment: 10 pages
Incremental Firmware Update Over-the-Air for Low-Power IoT Devices over LoRaWAN
Efficiently supporting remote firmware updates in Internet of Things (IoT) devices remains a significant challenge due to the limitations of many IoT communication protocols, which often make it impractical to transmit full firmware images. Techniques such as firmware partitioning have been introduced to mitigate this issue, but they frequently fall short, especially in battery-powered systems where time and energy constraints are critical. As a result, physical maintenance interventions are still commonly required, which is costly and inconvenient in large-scale deployments. In this work, we present a lightweight and innovative method that addresses this challenge by generating highly compact delta patches, enabling firmware reconstruction directly on the device. Our algorithm is specifically optimized for low-power devices, minimizing both memory usage and computational overhead. Compared to existing solutions, our approach significantly reduces the data volume needed for updates while maintaining performance comparable to more complex alternatives. Experimental evaluations confirm that our method yields substantial time and energy savings, making it particularly well-suited for battery-powered IoT nodes. Although our implementation targets the LoRaWAN protocol, the approach is flexible and can be adapted to other IoT communication technologies.
Dynamic Bipedal MPC with Foot-level Obstacle Avoidance and Adjustable Step Timing
Collision-free planning is essential for bipedal robots operating within unstructured environments. This paper presents a real-time Model Predictive Control (MPC) framework that addresses both body and foot avoidance for dynamic bipedal robots. Our contribution is two-fold: we introduce (1) a novel formulation for adjusting step timing to facilitate faster body avoidance and (2) a novel 3D foot-avoidance formulation that implicitly selects swing trajectories and footholds that either steps over or navigate around obstacles with awareness of Center of Mass (COM) dynamics. We achieve body avoidance by applying a half-space relaxation of the safe region but introduce a switching heuristic based on tracking error to detect a need to change foot-timing schedules. To enable foot avoidance and viable landing footholds on all sides of foot-level obstacles, we decompose the non-convex safe region on the ground into several convex polygons and use Mixed-Integer Quadratic Programming to determine the optimal candidate. We found that introducing a soft minimum-travel-distance constraint is effective in preventing the MPC from being trapped in local minima that can stall half-space relaxation methods behind obstacles. We demonstrated the proposed algorithms on multibody simulations on the bipedal robot platforms, Cassie and Digit, as well as hardware experiments on Digit.
Reduced Muscle Fatigue Using Continuous Subthreshold Kilohertz Stimulation of Peripheral Nerves
Functional electrical stimulation (FES) is a prevalent technique commonly used to activate muscles in individuals with neurological disorders. Traditional FES strategies predominantly utilize low-frequency (LF) stimulation, which evokes synchronous action potentials, leading to rapid muscle fatigue. To address these limitations, we introduced a subthreshold high-frequency (HF) stimulation method that employed continuous, charge-balanced subthreshold current pulses at kilohertz frequencies, designed to evoke motor unit (MU) activation similar to voluntary activation. We evaluated the effectiveness of HF stimulation on the reduction of muscle fatigue across different force levels (10 %, 25 %, and 40 % of maximum force). The HF stimulation utilized continuous charge-balanced, short pulses of 80 {\mu}s (at a 10 kHz frequency) targeted the ulnar/median nerve bundles. We compared the fatigue effects with conventional LF stimulation and voluntary muscle contractions. Our results indicated that HF stimulation maintained more sustained force outputs and muscle activation over a prolonged time compared with LF stimulation. The HF stimulation also evoked a more dispersed muscle activation pattern, similar to voluntary muscle contractions. These findings suggest that HF stimulation can significantly enhance the sustainability of muscle contractions and reduce muscle fatigue, potentially improving the efficacy and applicability of FES in clinical and home-based settings for individuals with neurological impairments.
From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance.
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification of Autonomous Systems
It remains a challenge to provide safety guarantees for autonomous systems with neural perception and control. A typical approach obtains symbolic bounds on perception error (e.g., using conformal prediction) and performs verification under these bounds. However, these bounds can lead to drastic conservatism in the resulting end-to-end safety guarantee. This paper proposes an approach to synthesize symbolic perception error bounds that serve as an optimal interface between perception performance and control verification. The key idea is to consider our error bounds to be heteroskedastic with respect to the system's state -- not time like in previous approaches. These bounds can be obtained with two gradient-free optimization algorithms. We demonstrate that our bounds lead to tighter safety guarantees than the state-of-the-art in a case study on a mountain car.
comment: Accepted at the 2nd International Conference on Neuro-symbolic Systems (NeuS) 2025 to NeuS'25
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
Rapid and Inexpensive Inertia Tensor Estimation from a Single Object Throw
The inertia tensor is an important parameter in many engineering fields, but measuring it can be cumbersome and involve multiple experiments or accurate and expensive equipment. We propose a method to measure the moment of inertia tensor of a rigid body from a single spinning throw, by attaching a small and inexpensive stand-alone measurement device consisting of a gyroscope, accelerometer and a reaction wheel. The method includes a compensation for the increase of moment of inertia due to adding the measurement device to the body, and additionally obtains the location of the centre of gravity of the body as an intermediate result. Experiments performed with known rigid bodies show that the mean accuracy is around 2%.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework
The integration of experimental technologies with large language models (LLMs) is transforming scientific research. It positions AI as a versatile research assistant rather than a mere problem-solving tool. In the field of power systems, however, managing simulations -- one of the essential experimental technologies -- remains a challenge for LLMs due to their limited domain-specific knowledge, restricted reasoning capabilities, and imprecise handling of simulation parameters. To address these limitations, this paper proposes a feedback-driven, multi-agent framework. It incorporates three proposed modules: an enhanced retrieval-augmented generation (RAG) module, an improved reasoning module, and a dynamic environmental acting module with an error-feedback mechanism. Validated on 69 diverse tasks from Daline and MATPOWER, this framework achieves success rates of 93.13% and 96.85%, respectively. It significantly outperforms ChatGPT 4o, o1-preview, and the fine-tuned GPT-4o, which all achieved a success rate lower than 30% on complex tasks. Additionally, the proposed framework also supports rapid, cost-effective task execution, completing each simulation in approximately 30 seconds at an average cost of 0.014 USD for tokens. Overall, this adaptable framework lays a foundation for developing intelligent LLM-based assistants for human researchers, facilitating power system research and beyond.
comment: 16 pages
Characterization of input-to-output stability for infinite-dimensional systems
We prove a superposition theorem for input-to-output stability (IOS) of a broad class of nonlinear infinite-dimensional systems with outputs including both continuous-time and discrete-time systems. It contains, as a special case, the superposition theorem for input-to-state stability (ISS) of infinite-dimensional systems and the IOS superposition theorem for systems of ordinary differential equations known from the literature. To achieve this result, we introduce and examine several novel stability and attractivity concepts for infinite-dimensional systems with outputs: We prove criteria for the uniform limit property for systems with outputs, several of which are new already for systems with full-state output, we provide superposition theorems for systems which satisfy both the output Lagrange stability (OL) and IOS, give a sufficient condition for OL and characterize ISS in terms of IOS and input/output-to-state stability. Finally, by means of counterexamples, we illustrate the challenges appearing on the way of extension of the superposition theorems from the literature to infinite-dimensional systems with outputs.
Modeling Nonlinear Oscillator Networks Using Physics-Informed Hybrid Reservoir Computing
Surrogate modeling of non-linear oscillator networks remains challenging due to discrepancies between simplified analytical models and real-world complexity. To bridge this gap, we investigate hybrid reservoir computing, combining reservoir computing with "expert" analytical models. Simulating the absence of an exact model, we first test the surrogate models with parameter errors in their expert model. Second, in a residual physics task, we assess their performance when their expert model lacks key non-linear coupling terms present in an extended ground-truth model. We focus on short-term forecasting across diverse dynamical regimes, evaluating the use of these surrogates for control applications. We show that hybrid reservoir computers generally outperform standard reservoir computers and exhibit greater robustness to parameter tuning. This advantage is less pronounced in the residual physics task. Notably, unlike standard reservoir computers, the performance of the hybrid does not degrade when crossing an observed spectral radius threshold. Furthermore, there is good performance for dynamical regimes not accessible to the expert model, demonstrating the contribution of the reservoir.
comment: 32 pages, 10 figures, 21 supplementary figures. Updated in response to review. Code at https://github.com/AJS50/Hybrid_RC_for_NLONS_paper_code
Symmetric Solutions to Symmetric Partial Difference Equations
This paper studies systems of linear difference equations on the lattice $\Z^n$ that are invariant under a finite group of symmetries, and shows that there exist solutions to such systems that are also invariant under this group of symmetries.
comment: Theorem 4.2 in the earlier arxiv version does not charaterise all the symmetric solutions to a symmetric system of difference equations on $\Z^n,$ as claimed. Thus, the statement on the dimension of the space of symmetric solutions is an underestimate
Dynamic Beam-Stabilized, Additive-Printed Flexible Antenna Arrays with On-Chip Rapid Insight Generation
Conformal phased arrays promise shape-changing properties, multiple degrees of freedom to the scan angle, and novel applications in wearables, aerospace, defense, vehicles, and ships. However, they have suffered from two critical limitations. (1) Although most applications require on-the-move communication and sensing, prior conformal arrays have suffered from dynamic deformation-induced beam pointing errors. We introduce a Dynamic Beam-Stabilized (DBS) processor capable of beam adaptation through on-chip real-time control of fundamental gain, phase, and delay for each element. (2) Prior conformal arrays have leveraged additive printing to enhance flexibility, but conventional printable inks based on silver are expensive, and those based on copper suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. We instead leverage a low-cost Copper Molecular Decomposition (CuMOD) ink with < 0.1% variation per degree C with temperature and strain and correct any residual deformity in real-time using the DBS processor. Demonstrating unified material and physical deformation correction, our CMOS DBS processor is low power, low-area, and easily scalable due to a tile architecture, thereby ideal for on-device implementations.
comment: This work was intended as a replacement of arXiv:2406.07797 and any subsequent updates will appear there
Beyond inherent robustness: strong stability of MPC despite plant-model mismatch
In this technical report, we establish the asymptotic stability of MPC under plant-model mismatch for problems where the origin remains a steady state despite mismatch. This class of problems includes, but is not limited to, inventory management, path-planning, and control of systems in deviation variables. Our results differ from prior results on the inherent robustness of MPC, which guarantee only convergence to a neighborhood of the origin, the size of which scales with the magnitude of the mismatch. For MPC with quadratic costs, continuous differentiability of the system dynamics is sufficient to demonstrate exponential stability of the closed-loop system despite mismatch. For MPC with general costs, a joint comparison function bound and scaling condition guarantee asymptotic stability despite mismatch. The results are illustrated in numerical simulations, including the classic upright pendulum problem. The tools developed to establish these results can address the stability of offset-free MPC, an open and interesting question in the MPC research literature.
comment: 45 pages, 9 figures
Distributed Model Predictive Control Design for Multi-agent Systems via Bayesian Optimization
This paper introduces a new approach that leverages Multi-agent Bayesian Optimization (MABO) to design Distributed Model Predictive Control (DMPC) schemes for multi-agent systems. The primary objective is to learn optimal DMPC schemes even when local model predictive controllers rely on imperfect local models. The proposed method invokes a dual decomposition-based distributed optimization framework, incorporating an Alternating Direction Method of Multipliers (ADMM)-based MABO algorithm to enable coordinated learning of parameterized DMPC schemes. This enhances the closed-loop performance of local controllers, despite discrepancies between their models and the actual multi-agent system dynamics. In addition to the newly proposed algorithms, this work also provides rigorous proofs establishing the optimality and convergence of the underlying learning method. Finally, numerical examples are given to demonstrate the efficacy of the proposed MABO-based learning approach.
Stochastic Learning of Computational Resource Usage as Graph Structured Multimarginal Schrödinger Bridge
We propose to learn the time-varying stochastic computational resource usage of software as a graph structured Schr\"odinger bridge problem. In general, learning the computational resource usage from data is challenging because resources such as the number of CPU instructions and the number of last level cache requests are both time-varying and statistically correlated. Our proposed method enables learning the joint time-varying stochasticity in computational resource usage from the measured profile snapshots in a nonparametric manner. The method can be used to predict the most-likely time-varying distribution of computational resource availability at a desired time. We provide detailed algorithms for stochastic learning in both single and multi-core cases, discuss the convergence guarantees, computational complexities, and demonstrate their practical use in two case studies: a single-core nonlinear model predictive controller, and a synthetic multi-core software.
Process Resilience under Optimal Data Injection Attacks
In this paper, we study the resilience of process systems in an {\it information-theoretic framework}, from the perspective of an attacker capable of optimally constructing data injection attacks. The attack aims to distract the stationary distributions of process variables and stay stealthy, simultaneously. The problem is formulated as designing a multivariate Gaussian distribution to maximize the Kullback-Leibler divergence between the stationary distributions of states and state estimates under attacks and without attacks, while minimizing that between the distributions of sensor measurements. When the attacker has limited access to sensors, sparse attacks are proposed by incorporating a sparsity constraint. {We conduct theoretical analysis on the convexity of the attack construction problem and present a greedy algorithm, which enables systematic assessment of measurement vulnerability, thereby offering insights into the inherent resilience of process systems. We numerically evaluate the performance of proposed constructions on a two-reactor process.
comment: 46 pages, 5 figures, published in AIChE journal, May 2025
Systems and Control (EESS)
An Empirical Bayes approach to ARX Estimation
Empirical Bayes inference is based on estimation of the parameters of an a priori distribution from the observed data. The estimation technique of the parameters of the prior, called hyperparameters, is based on the marginal distribution obtained by integrating the joint density of the model with respect to the prior. This is a key step which needs to be properly adapted to the problem at hand. In this paper we study Empirical Bayes inference of linear autoregressive models with inputs (ARX models) for time series and compare the performance of the marginal parametric estimator with that a full Empirical Bayesian analysis based on the estimated prior. Such a comparison, can only make sense for a (realistic) finite data length. In this setting, we propose a new estimation technique of the hyperparameters by a sequential Bayes procedure which is essentially a backward Kalman filter. It turns out that for finite data length the marginal Bayes tends to behave slightly better than the full Empirical Bayesian parameter estimator and so also in the case of slowly varying random parameters.
Quantum Hardware-in-the-Loop for Optimal Power Flow in Renewable-Integrated Power Systems
This paper presents a proof-of-concept for integrating quantum hardware with real-time digital simulator (RTDS) to model and control modern power systems, including renewable energy resources. Power flow (PF) analysis and optimal power flow (OPF) studies are conducted using RTDS coupled with Fujitsu's CMOS Digital Annealer and D-Wave's Advantage quantum processors. The adiabatic quantum power flow (AQPF) and adiabatic quantum optimal power flow (AQOPF) algorithms are used to perform PF and OPF, respectively, on quantum and quantum-inspired hardware. The experiments are performed on the IEEE 9-bus test system and a modified version that includes solar and wind farms. The findings demonstrate that the AQPF and AQOPF algorithms can accurately perform PF and OPF, yielding results that closely match those of classical Newton-Raphson (NR) solvers while also exhibiting robust convergence. Furthermore, the integration of renewable energy sources (RES) within the AQOPF framework proves effective in maintaining system stability and performance, even under variable generation conditions. These findings highlight the potential of quantum computing to significantly enhance the modeling and control of future power grids, particularly in systems with high renewable energy penetration.
Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems
We investigate multitask edge-user communication-computation resource allocation for $360^\circ$ video streaming in an edge-computing enabled millimeter wave (mmWave) multi-user virtual reality system. To balance the communication-computation trade-offs that arise herein, we formulate a video quality maximization problem that integrates interdependent multitask/multi-user action spaces and rebuffering time/quality variation constraints. We formulate a deep reinforcement learning framework for \underline{m}ulti-\underline{t}ask \underline{r}ate adaptation and \underline{c}omputation distribution (MTRC) to solve the problem of interest. Our solution does not rely on a priori knowledge about the environment and uses only prior video streaming statistics (e.g., throughput, decoding time, and transmission delay), and content information, to adjust the assigned video bitrates and computation distribution, as it observes the induced streaming performance online. Moreover, to capture the task interdependence in the environment, we leverage neural network cascades to extend our MTRC method to two novel variants denoted as R1C2 and C1R2. We train all three methods with real-world mmWave network traces and $360^\circ$ video datasets to evaluate their performance in terms of expected quality of experience (QoE), viewport peak signal-to-noise ratio (PSNR), rebuffering time, and quality variation. We outperform state-of-the-art rate adaptation algorithms, with C1R2 showing best results and achieving $5.21-6.06$ dB PSNR gains, $2.18-2.70$x rebuffering time reduction, and $4.14-4.50$ dB quality variation reduction.
comment: Accepted to be published in IEEE Transaction on Multimedia
Learning Driven Elastic Task Multi-Connectivity Immersive Computing Systems
In virtual reality (VR) environments, computational tasks exhibit an elastic nature, meaning they can dynamically adjust based on various user and system constraints. This elasticity is essential for maintaining immersive experiences; however, it also introduces challenges for communication and computing in VR systems. In this paper, we investigate elastic task offloading for multi-user edge-computing-enabled VR systems with multi-connectivity, aiming to maximize the computational energy-efficiency (computational throughput per unit of energy consumed). To balance the induced communication, computation, energy consumption, and quality of experience trade-offs due to the elasticity of VR tasks, we formulate a constrained stochastic computational energy-efficiency optimization problem that integrates the multi-connectivity/multi-user action space and the elastic nature of VR computational tasks. We formulate a centralized phasic policy gradient (CPPG) framework to solve the problem of interest online, using only prior elastic task offloading statistics (energy consumption, response time, and transmission time), and task information (i.e., task size and computational intensity), while observing the induced system performance (energy consumption and latency). We further extend our approach to decentralized learning by formulating an independent phasic policy gradient (IPPG) method and a decentralized shared multi-armed bandit (DSMAB) method. We train our methods with real-world 4G, 5G, and WiGig network traces and 360 video datasets to evaluate their performance in terms of response time, energy efficiency, scalability, and delivered quality of experience. We also provide a comprehensive analysis of task size and its effect on offloading policy and system performance. In particular, we show that CPPG reduces latency by 28% and energy consumption by 78% compared to IPPG.
comment: Under review by IEEE Transaction on Mobile Computing
Frequency-Dependent Power Consumption Modeling of CMOS Transmitters for WNoC Architectures
Wireless Network-on-Chip (WNoC) systems, which wirelessly interconnect the chips of a computing system, have been proposed as a complement to existing chip-to-chip wired links. However, their feasibility depends on the availability of custom-designed high-speed, tiny, ultra-efficient transceivers. This represents a challenge due to the tradeoffs between bandwidth, area, and energy efficiency that are found as frequency increases, which suggests that there is an optimal frequency region. To aid in the search for such an optimal design point, this paper presents a behavioral model that quantifies the expected power consumption of oscillators, mixers, and power amplifiers as a function of frequency. The model is built on extensive surveys of the respective sub-blocks, all based on experimental data. By putting together the models of the three sub-blocks, a comprehensive power model is obtained, which will aid in selecting the optimal operating frequency for WNoC systems.
Output behavior equivalence and simultaneous subspace identification of systems and faults
We address the problem of identifying a system subject to additive faults, while simultaneously reconstructing the fault signal via subspace methods. We do not require nominal data for the identification, neither do we impose any assumption on the class of faults, e.g., sensor or actuator faults. We show that, under mild assumptions on the fault signal, standard PI-MOESP can recover the system matrices associated to the input-output subsystem. Then we introduce the concept of output behavior equivalence, which characterizes systems with the same output behavior set, and present a method to establish this equivalence from system matrices. Finally, we show how to estimate from data the complete set of fault matrices for which there exist a fault signal with minimal dimension that explains the data.
comment: 7 pages, 2 figures. Under review. To reproduce the results of this preprint, see https://github.com/ggleizer/fault_id
Low-regret Strategies for Energy Systems Planning in a Highly Uncertain Future
Large uncertainties in the energy transition urge decision-makers to develop low-regret strategies, i.e., strategies that perform well regardless of how the future unfolds. To address this challenge, we introduce a decision-support framework that identifies low-regret strategies in energy system planning under uncertainty. Our framework (i) automatically identifies strategies, (ii) evaluates their performance in terms of regret, (iii) assesses the key drivers of regret, and (iv) supports the decision process with intuitive decision trees, regret curves and decision maps. We apply the framework to evaluate the optimal use of biomass in the transition to net-zero energy systems, considering all major biomass utilization options: biofuels, biomethane, chemicals, hydrogen, biochar, electricity, and heat. Producing fuels and chemicals from biomass performs best across various decision-making criteria. In contrast, the current use of biomass, mainly for low-temperature heat supply, results in high regret, making it a must-avoid in the energy transition.
comment: Main paper: 22 pages, 5 figures; Supplementary Information: 70 pages, 100 figures
Constraint-Aware Diffusion Guidance for Robotics: Real-Time Obstacle Avoidance for Autonomous Racing
Diffusion models hold great potential in robotics due to their ability to capture complex, high-dimensional data distributions. However, their lack of constraint-awareness limits their deployment in safety-critical applications. We propose Constraint-Aware Diffusion Guidance (CoDiG), a data-efficient and general-purpose framework that integrates barrier functions into the denoising process, guiding diffusion sampling toward constraint-satisfying outputs. CoDiG enables constraint satisfaction even with limited training data and generalizes across tasks. We evaluate our framework in the challenging setting of miniature autonomous racing, where real-time obstacle avoidance is essential. Real-world experiments show that CoDiG generates safe outputs efficiently under dynamic conditions, highlighting its potential for broader robotic applications. A demonstration video is available at https://youtu.be/KNYsTdtdxOU.
RSS-Based Localization: Ensuring Consistency and Asymptotic Efficiency
We study the problem of signal source localization using received signal strength measurements. We begin by presenting verifiable geometric conditions for sensor deployment that ensure the model's asymptotic localizability. Then we establish the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, computing the ML estimator is challenging due to its reliance on solving a non-convex optimization problem. To overcome this, we propose a two-step estimator that retains the same asymptotic properties as the ML estimator while offering low computational complexity, linear in the number of measurements. The main challenge lies in obtaining a consistent estimator in the first step. To address this, we construct two linear least-squares estimation problems by applying algebraic transformations to the nonlinear measurement model, leading to closed-form solutions. In the second step, we perform a single Gauss-Newton iteration using the consistent estimator from the first step as the initialization, achieving the same asymptotic efficiency as the ML estimator. Finally, simulation results validate the theoretical property and practical effectiveness of the proposed two-step estimator.
Disentangling Coordiante Frames for Task Specific Motion Retargeting in Teleoperation using Shared Control and VR Controllers
Task performance in terms of task completion time in teleoperation is still far behind compared to humans conducting tasks directly. One large identified impact on this is the human capability to perform transformations and alignments, which is directly influenced by the point of view and the motion retargeting strategy. In modern teleoperation systems, motion retargeting is usually implemented through a one time calibration or switching modes. Complex tasks, like concatenated screwing, might be difficult, because the operator has to align (e.g. mirror) rotational and translational input commands. Recent research has shown, that the separation of translation and rotation leads to increased task performance. This work proposes a formal motion retargeting method, which separates translational and rotational input commands. This method is then included in a optimal control based trajectory planner and shown to work on a UR5e manipulator.
comment: 8 pages, 4 figures, conference
Regularized Model Predictive Control
In model predictive control (MPC), the choice of cost-weighting matrices and designing the Hessian matrix directly affects the trade-off between rapid state regulation and minimizing the control effort. However, traditional MPC in quadratic programming relies on fixed design matrices across the entire horizon, which can lead to suboptimal performance. This letter presents a Riccati equation-based method for adjusting the design matrix within the MPC framework, which enhances real-time performance. We employ a penalized least-squares (PLS) approach to derive a quadratic cost function for a discrete-time linear system over a finite prediction horizon. Using the method of weighting and enforcing the constraint equation by introducing a large penalty parameter, we solve the constrained optimization problem and generate control inputs for forward-shifted horizons. This process yields a recursive PLS-based Riccati equation that updates the design matrix as a regularization term in each shift, forming the foundation of the regularized MPC (Re-MPC) algorithm. To accomplish this, we provide a convergence and stability analysis of the developed algorithm. Numerical analysis demonstrates its superiority over traditional methods by allowing Riccati equation-based adjustments.
6G-Enabled Smart Railways
Smart railways integrate advanced information technologies into railway operating systems to improve efficiency and reliability. Although the development of 5G has enhanced railway services, future smart railways require ultra-high speeds, ultra-low latency, ultra-high security, full coverage, and ultra-high positioning accuracy, which 5G cannot fully meet. Therefore, 6G is envisioned to provide green and efficient all-day operations, strong information security, fully automatic driving, and low-cost intelligent maintenance. To achieve these requirements, we propose an integrated network architecture leveraging communications, computing, edge intelligence, and caching in railway systems. We have conducted in-depth investigations on key enabling technologies for reliable transmissions and wireless coverage. For high-speed mobile scenarios, we propose an AI-enabled cross-domain channel modeling and orthogonal time-frequency space-time spread multiple access mechanism to alleviate the conflict between limited spectrum availability and massive user access. The roles of blockchain, edge intelligence, and privacy technologies in endogenously secure rail communications are also evaluated. We further explore the application of emerging paradigms such as integrated sensing and communications, AI-assisted Internet of Things, semantic communications, and digital twin networks for railway maintenance, monitoring, prediction, and accident warning. Finally, possible future research and development directions are discussed.
Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach
The pursuit of rate maximization in wireless communication frequently encounters substantial challenges associated with user fairness. This paper addresses these challenges by exploring a novel power allocation approach for delay optimization, utilizing graph neural networks (GNNs)-based reinforcement learning (RL) in device-to-device (D2D) communication. The proposed approach incorporates not only channel state information but also factors such as packet delay, the number of backlogged packets, and the number of transmitted packets into the components of the state information. We adopt a centralized RL method, where a central controller collects and processes the state information. The central controller functions as an agent trained using the proximal policy optimization (PPO) algorithm. To better utilize topology information in the communication network and enhance the generalization of the proposed method, we embed GNN layers into both the actor and critic networks of the PPO algorithm. This integration allows for efficient parameter updates of GNNs and enables the state information to be parameterized as a low-dimensional embedding, which is leveraged by the agent to optimize power allocation strategies. Simulation results demonstrate that the proposed method effectively reduces average delay while ensuring user fairness, outperforms baseline methods, and exhibits scalability and generalization capability.
Scheduling of Flexible Manufacturing Systems Based on Place-Timed Petri Nets and Basis Reachability Graphs
Scheduling is a key decision-making process to improve the performance of flexible manufacturing systems. Place-timed Petri nets provide a formal method for graphically modeling and analyzing such systems. By generating reachability graphs and combining intelligent search algorithms, operation sequences from the initial state to the target state can be found for the underlying system. However, the reachability graph grows exponentially with the system size increases, which is the main challenge of existing methods for scheduling large systems. To this end, we develop an efficient improved beam search algorithm to optimize the makespan based on a compact representation of reachability graph called basis reachability graph. The key idea behind the proposed method is to form a state together with the basis markings and its corresponding transition sequences, and evaluate the cost of the state based on the resource idle time. Experimental results are conducted on several benchmark systems which show that the developed method improves the search efficiency while ensuring the quality of the solution compared with existing methods.
2T1R Regulated Memristor Conductance Control Array Architecture for Neuromorphic Computing using 28nm CMOS Technology
Memristors are promising devices for scalable and low power, in-memory computing to improve the energy efficiency of a rising computational demand. The crossbar array architecture with memristors is used for vector matrix multiplication (VMM) and acts as kernels in neuromorphic computing. The analog conductance control in a memristor is achieved by applying voltage or current through it. A basic 1T1R array is suitable to avoid sneak path issues but suffer from wire resistances, which affects the read and write procedures. A conductance control scheme with a regulated voltage source will improve the architecture and reduce the possible potential divider effects. A change in conductance is also possible with the provision of a regulated current source and measuring the voltage across the memristors. A regulated 2T1R memristor conductance control architecture is proposed in this work, which avoids the potential divider effect and virtual ground scenario in a regular crossbar scheme, as well as conductance control by passing a regulated current through memristors. The sneak path current is not allowed to pass by the provision of ground potential to both terminals of memristors.
Connecting the Equinoctial Elements and Rodrigues Parameters: A New Set of Elements
A geometric interpretation of the equinoctial elements is given with a connection to orthogonal rotations and attitude dynamics in Euclidean 3-space. An identification is made between the equinoctial elements and classic Rodrigues parameters. A new set of equinoctial elements are developed using the modified Rodrigues parameters, thereby removing the coordinate singularity for retrograde equatorial orbits present in previous versions of these elements. A low-thrust trajectory optimization problem is set up using the new elements to numerically verify convergence for the two-point boundary problem, as compared to their predecessors.
Secrecy Capacity of Hybrid VLC-RF Systems with Light Energy Harvesting
This paper studies the performance of physical layer security (PLS) in a multi-user hybrid heterogeneous visible light communication (VLC) and radio frequency (RF) wireless communication system with simultaneous lightwave information and power transfer (SLIPT). In the considered system, VLC is used for downlink (DL) while RF is employed for uplink (UL) transmission. In addition, to support multiple users, time division multiple access (TDMA) is assumed for both DL and UL channels. In the DL, each user receives information during its allocated time slot of the TDMA frame and harvests energy from the received signal outside the time slot. The harvested energy is then used for transmitting the signal over the UL channel, which is subject to eavesdropping by an unauthorized user. Therefore, PLS is employed to protect the confidentiality of the UL information. Then, an optimization problem is formulated to solve the optimal DL and UL time slots that maximize the PLS performance given a target sum rate of the DL. We show that the problem can be cast as a difference of convex functions (DC) program, which can be solved efficiently using the DC algorithm (DCA).
A Control Oriented Fractional-Order Model of Lithium-ion Batteries Based on Caputo Definition
This letter proposes a fractional-order battery model based on the Caputo definition. A closed-form time-domain solution is derived, enabling a simple recursive expression for discrete-time implementation. The model requires only the current and previous time-step states in each iteration, significantly reducing memory usage compared to the conventional Gr\"{u}nwald--Letnikov (G-L) method. This recursive structure is highly compatible with filter design and online parameter identification. Experimental validation on a 40.2~Ah NCM622 cell shows that the proposed first-order model achieves voltage prediction accuracy comparable to a second-order integer-order model. The results demonstrate that the Caputo-based model offers a practical balance between accuracy and computational efficiency, making it well suited for real-time battery management systems (BMS).
Transmission Neural Networks: Approximation and Optimal Control
Transmission Neural Networks (TransNNs) introduced by Gao and Caines (2022) connect virus spread models over networks and neural networks with tuneable activation functions. This paper presents the approximation technique and the underlying assumptions employed by TransNNs in relation to the corresponding Markovian Susceptible-Infected-Susceptible (SIS) model with 2^n states, where n is the number of nodes in the network. The underlying infection paths are assumed to be stochastic with heterogeneous and time-varying transmission probabilities. We obtain the conditional probability of infection in the stochastic 2^n-state SIS epidemic model corresponding to each state configuration under mild assumptions, which enables control solutions based on Markov decision processes (MDP). Finally, MDP control with 2^n-state SIS epidemic models and optimal control with TransNNs are compared in terms of mitigating virus spread over networks through vaccination, and it is shown that TranNNs enable the generation of control laws with significant computational savings, albeit with more conservative control actions.
MSCEKF-MIO: Magnetic-Inertial Odometry Based on Multi-State Constraint Extended Kalman Filter
To overcome the limitation of existing indoor odometry technologies which often cannot simultaneously meet requirements for accuracy cost-effectiveness, and robustness-this paper proposes a novel magnetometer array-aided inertial odometry approach, MSCEKF-MIO (Multi-State Constraint Extended Kalman Filter-based Magnetic-Inertial Odometry). We construct a magnetic field model by fitting measurements from the magnetometer array and then use temporal variations in this model-extracted from continuous observations-to estimate the carrier's absolute velocity. Furthermore, we implement the MSCEKF framework to fuse observed magnetic field variations with position and attitude estimates from inertial navigation system (INS) integration, thereby enabling autonomous, high-precision indoor relative positioning. Experimental results demonstrate that the proposed algorithm achieves superior velocity estimation accuracy and horizontal positioning precision relative to state-of-the-art magnetic array-aided INS algorithms (MAINS). On datasets with trajectory lengths of 150-250m, the proposed method yields an average horizontal position RMSE of approximately 2.5m. In areas with distinctive magnetic features, the magneto-inertial odometry achieves a velocity estimation accuracy of 0.07m/s. Consequently, the proposed method offers a novel positioning solution characterized by low power consumption, cost-effectiveness, and high reliability in complex indoor environments.
comment: 10 pages
Incremental Firmware Update Over-the-Air for Low-Power IoT Devices over LoRaWAN
Efficiently supporting remote firmware updates in Internet of Things (IoT) devices remains a significant challenge due to the limitations of many IoT communication protocols, which often make it impractical to transmit full firmware images. Techniques such as firmware partitioning have been introduced to mitigate this issue, but they frequently fall short, especially in battery-powered systems where time and energy constraints are critical. As a result, physical maintenance interventions are still commonly required, which is costly and inconvenient in large-scale deployments. In this work, we present a lightweight and innovative method that addresses this challenge by generating highly compact delta patches, enabling firmware reconstruction directly on the device. Our algorithm is specifically optimized for low-power devices, minimizing both memory usage and computational overhead. Compared to existing solutions, our approach significantly reduces the data volume needed for updates while maintaining performance comparable to more complex alternatives. Experimental evaluations confirm that our method yields substantial time and energy savings, making it particularly well-suited for battery-powered IoT nodes. Although our implementation targets the LoRaWAN protocol, the approach is flexible and can be adapted to other IoT communication technologies.
Dynamic Bipedal MPC with Foot-level Obstacle Avoidance and Adjustable Step Timing
Collision-free planning is essential for bipedal robots operating within unstructured environments. This paper presents a real-time Model Predictive Control (MPC) framework that addresses both body and foot avoidance for dynamic bipedal robots. Our contribution is two-fold: we introduce (1) a novel formulation for adjusting step timing to facilitate faster body avoidance and (2) a novel 3D foot-avoidance formulation that implicitly selects swing trajectories and footholds that either steps over or navigate around obstacles with awareness of Center of Mass (COM) dynamics. We achieve body avoidance by applying a half-space relaxation of the safe region but introduce a switching heuristic based on tracking error to detect a need to change foot-timing schedules. To enable foot avoidance and viable landing footholds on all sides of foot-level obstacles, we decompose the non-convex safe region on the ground into several convex polygons and use Mixed-Integer Quadratic Programming to determine the optimal candidate. We found that introducing a soft minimum-travel-distance constraint is effective in preventing the MPC from being trapped in local minima that can stall half-space relaxation methods behind obstacles. We demonstrated the proposed algorithms on multibody simulations on the bipedal robot platforms, Cassie and Digit, as well as hardware experiments on Digit.
Reduced Muscle Fatigue Using Continuous Subthreshold Kilohertz Stimulation of Peripheral Nerves
Functional electrical stimulation (FES) is a prevalent technique commonly used to activate muscles in individuals with neurological disorders. Traditional FES strategies predominantly utilize low-frequency (LF) stimulation, which evokes synchronous action potentials, leading to rapid muscle fatigue. To address these limitations, we introduced a subthreshold high-frequency (HF) stimulation method that employed continuous, charge-balanced subthreshold current pulses at kilohertz frequencies, designed to evoke motor unit (MU) activation similar to voluntary activation. We evaluated the effectiveness of HF stimulation on the reduction of muscle fatigue across different force levels (10 %, 25 %, and 40 % of maximum force). The HF stimulation utilized continuous charge-balanced, short pulses of 80 {\mu}s (at a 10 kHz frequency) targeted the ulnar/median nerve bundles. We compared the fatigue effects with conventional LF stimulation and voluntary muscle contractions. Our results indicated that HF stimulation maintained more sustained force outputs and muscle activation over a prolonged time compared with LF stimulation. The HF stimulation also evoked a more dispersed muscle activation pattern, similar to voluntary muscle contractions. These findings suggest that HF stimulation can significantly enhance the sustainability of muscle contractions and reduce muscle fatigue, potentially improving the efficacy and applicability of FES in clinical and home-based settings for individuals with neurological impairments.
From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance.
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification of Autonomous Systems
It remains a challenge to provide safety guarantees for autonomous systems with neural perception and control. A typical approach obtains symbolic bounds on perception error (e.g., using conformal prediction) and performs verification under these bounds. However, these bounds can lead to drastic conservatism in the resulting end-to-end safety guarantee. This paper proposes an approach to synthesize symbolic perception error bounds that serve as an optimal interface between perception performance and control verification. The key idea is to consider our error bounds to be heteroskedastic with respect to the system's state -- not time like in previous approaches. These bounds can be obtained with two gradient-free optimization algorithms. We demonstrate that our bounds lead to tighter safety guarantees than the state-of-the-art in a case study on a mountain car.
comment: Accepted at the 2nd International Conference on Neuro-symbolic Systems (NeuS) 2025 to NeuS'25
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
Rapid and Inexpensive Inertia Tensor Estimation from a Single Object Throw
The inertia tensor is an important parameter in many engineering fields, but measuring it can be cumbersome and involve multiple experiments or accurate and expensive equipment. We propose a method to measure the moment of inertia tensor of a rigid body from a single spinning throw, by attaching a small and inexpensive stand-alone measurement device consisting of a gyroscope, accelerometer and a reaction wheel. The method includes a compensation for the increase of moment of inertia due to adding the measurement device to the body, and additionally obtains the location of the centre of gravity of the body as an intermediate result. Experiments performed with known rigid bodies show that the mean accuracy is around 2%.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework
The integration of experimental technologies with large language models (LLMs) is transforming scientific research. It positions AI as a versatile research assistant rather than a mere problem-solving tool. In the field of power systems, however, managing simulations -- one of the essential experimental technologies -- remains a challenge for LLMs due to their limited domain-specific knowledge, restricted reasoning capabilities, and imprecise handling of simulation parameters. To address these limitations, this paper proposes a feedback-driven, multi-agent framework. It incorporates three proposed modules: an enhanced retrieval-augmented generation (RAG) module, an improved reasoning module, and a dynamic environmental acting module with an error-feedback mechanism. Validated on 69 diverse tasks from Daline and MATPOWER, this framework achieves success rates of 93.13% and 96.85%, respectively. It significantly outperforms ChatGPT 4o, o1-preview, and the fine-tuned GPT-4o, which all achieved a success rate lower than 30% on complex tasks. Additionally, the proposed framework also supports rapid, cost-effective task execution, completing each simulation in approximately 30 seconds at an average cost of 0.014 USD for tokens. Overall, this adaptable framework lays a foundation for developing intelligent LLM-based assistants for human researchers, facilitating power system research and beyond.
comment: 16 pages
Characterization of input-to-output stability for infinite-dimensional systems
We prove a superposition theorem for input-to-output stability (IOS) of a broad class of nonlinear infinite-dimensional systems with outputs including both continuous-time and discrete-time systems. It contains, as a special case, the superposition theorem for input-to-state stability (ISS) of infinite-dimensional systems and the IOS superposition theorem for systems of ordinary differential equations known from the literature. To achieve this result, we introduce and examine several novel stability and attractivity concepts for infinite-dimensional systems with outputs: We prove criteria for the uniform limit property for systems with outputs, several of which are new already for systems with full-state output, we provide superposition theorems for systems which satisfy both the output Lagrange stability (OL) and IOS, give a sufficient condition for OL and characterize ISS in terms of IOS and input/output-to-state stability. Finally, by means of counterexamples, we illustrate the challenges appearing on the way of extension of the superposition theorems from the literature to infinite-dimensional systems with outputs.
Modeling Nonlinear Oscillator Networks Using Physics-Informed Hybrid Reservoir Computing
Surrogate modeling of non-linear oscillator networks remains challenging due to discrepancies between simplified analytical models and real-world complexity. To bridge this gap, we investigate hybrid reservoir computing, combining reservoir computing with "expert" analytical models. Simulating the absence of an exact model, we first test the surrogate models with parameter errors in their expert model. Second, in a residual physics task, we assess their performance when their expert model lacks key non-linear coupling terms present in an extended ground-truth model. We focus on short-term forecasting across diverse dynamical regimes, evaluating the use of these surrogates for control applications. We show that hybrid reservoir computers generally outperform standard reservoir computers and exhibit greater robustness to parameter tuning. This advantage is less pronounced in the residual physics task. Notably, unlike standard reservoir computers, the performance of the hybrid does not degrade when crossing an observed spectral radius threshold. Furthermore, there is good performance for dynamical regimes not accessible to the expert model, demonstrating the contribution of the reservoir.
comment: 32 pages, 10 figures, 21 supplementary figures. Updated in response to review. Code at https://github.com/AJS50/Hybrid_RC_for_NLONS_paper_code
Symmetric Solutions to Symmetric Partial Difference Equations
This paper studies systems of linear difference equations on the lattice $\Z^n$ that are invariant under a finite group of symmetries, and shows that there exist solutions to such systems that are also invariant under this group of symmetries.
comment: Theorem 4.2 in the earlier arxiv version does not charaterise all the symmetric solutions to a symmetric system of difference equations on $\Z^n,$ as claimed. Thus, the statement on the dimension of the space of symmetric solutions is an underestimate
Dynamic Beam-Stabilized, Additive-Printed Flexible Antenna Arrays with On-Chip Rapid Insight Generation
Conformal phased arrays promise shape-changing properties, multiple degrees of freedom to the scan angle, and novel applications in wearables, aerospace, defense, vehicles, and ships. However, they have suffered from two critical limitations. (1) Although most applications require on-the-move communication and sensing, prior conformal arrays have suffered from dynamic deformation-induced beam pointing errors. We introduce a Dynamic Beam-Stabilized (DBS) processor capable of beam adaptation through on-chip real-time control of fundamental gain, phase, and delay for each element. (2) Prior conformal arrays have leveraged additive printing to enhance flexibility, but conventional printable inks based on silver are expensive, and those based on copper suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. We instead leverage a low-cost Copper Molecular Decomposition (CuMOD) ink with < 0.1% variation per degree C with temperature and strain and correct any residual deformity in real-time using the DBS processor. Demonstrating unified material and physical deformation correction, our CMOS DBS processor is low power, low-area, and easily scalable due to a tile architecture, thereby ideal for on-device implementations.
comment: This work was intended as a replacement of arXiv:2406.07797 and any subsequent updates will appear there
Beyond inherent robustness: strong stability of MPC despite plant-model mismatch
In this technical report, we establish the asymptotic stability of MPC under plant-model mismatch for problems where the origin remains a steady state despite mismatch. This class of problems includes, but is not limited to, inventory management, path-planning, and control of systems in deviation variables. Our results differ from prior results on the inherent robustness of MPC, which guarantee only convergence to a neighborhood of the origin, the size of which scales with the magnitude of the mismatch. For MPC with quadratic costs, continuous differentiability of the system dynamics is sufficient to demonstrate exponential stability of the closed-loop system despite mismatch. For MPC with general costs, a joint comparison function bound and scaling condition guarantee asymptotic stability despite mismatch. The results are illustrated in numerical simulations, including the classic upright pendulum problem. The tools developed to establish these results can address the stability of offset-free MPC, an open and interesting question in the MPC research literature.
comment: 45 pages, 9 figures
Distributed Model Predictive Control Design for Multi-agent Systems via Bayesian Optimization
This paper introduces a new approach that leverages Multi-agent Bayesian Optimization (MABO) to design Distributed Model Predictive Control (DMPC) schemes for multi-agent systems. The primary objective is to learn optimal DMPC schemes even when local model predictive controllers rely on imperfect local models. The proposed method invokes a dual decomposition-based distributed optimization framework, incorporating an Alternating Direction Method of Multipliers (ADMM)-based MABO algorithm to enable coordinated learning of parameterized DMPC schemes. This enhances the closed-loop performance of local controllers, despite discrepancies between their models and the actual multi-agent system dynamics. In addition to the newly proposed algorithms, this work also provides rigorous proofs establishing the optimality and convergence of the underlying learning method. Finally, numerical examples are given to demonstrate the efficacy of the proposed MABO-based learning approach.
Stochastic Learning of Computational Resource Usage as Graph Structured Multimarginal Schrödinger Bridge
We propose to learn the time-varying stochastic computational resource usage of software as a graph structured Schr\"odinger bridge problem. In general, learning the computational resource usage from data is challenging because resources such as the number of CPU instructions and the number of last level cache requests are both time-varying and statistically correlated. Our proposed method enables learning the joint time-varying stochasticity in computational resource usage from the measured profile snapshots in a nonparametric manner. The method can be used to predict the most-likely time-varying distribution of computational resource availability at a desired time. We provide detailed algorithms for stochastic learning in both single and multi-core cases, discuss the convergence guarantees, computational complexities, and demonstrate their practical use in two case studies: a single-core nonlinear model predictive controller, and a synthetic multi-core software.
Process Resilience under Optimal Data Injection Attacks
In this paper, we study the resilience of process systems in an {\it information-theoretic framework}, from the perspective of an attacker capable of optimally constructing data injection attacks. The attack aims to distract the stationary distributions of process variables and stay stealthy, simultaneously. The problem is formulated as designing a multivariate Gaussian distribution to maximize the Kullback-Leibler divergence between the stationary distributions of states and state estimates under attacks and without attacks, while minimizing that between the distributions of sensor measurements. When the attacker has limited access to sensors, sparse attacks are proposed by incorporating a sparsity constraint. {We conduct theoretical analysis on the convexity of the attack construction problem and present a greedy algorithm, which enables systematic assessment of measurement vulnerability, thereby offering insights into the inherent resilience of process systems. We numerically evaluate the performance of proposed constructions on a two-reactor process.
comment: 46 pages, 5 figures, published in AIChE journal, May 2025
Robotics
ProMi: An Efficient Prototype-Mixture Baseline for Few-Shot Segmentation with Bounding-Box Annotations
In robotics applications, few-shot segmentation is crucial because it allows robots to perform complex tasks with minimal training data, facilitating their adaptation to diverse, real-world environments. However, pixel-level annotations of even small amount of images is highly time-consuming and costly. In this paper, we present a novel few-shot binary segmentation method based on bounding-box annotations instead of pixel-level labels. We introduce, ProMi, an efficient prototype-mixture-based method that treats the background class as a mixture of distributions. Our approach is simple, training-free, and effective, accommodating coarse annotations with ease. Compared to existing baselines, ProMi achieves the best results across different datasets with significant gains, demonstrating its effectiveness. Furthermore, we present qualitative experiments tailored to real-world mobile robot tasks, demonstrating the applicability of our approach in such scenarios. Our code: https://github.com/ThalesGroup/promi.
Robust Reinforcement Learning-Based Locomotion for Resource-Constrained Quadrupeds with Exteroceptive Sensing ICRA
Compact quadrupedal robots are proving increasingly suitable for deployment in real-world scenarios. Their smaller size fosters easy integration into human environments. Nevertheless, real-time locomotion on uneven terrains remains challenging, particularly due to the high computational demands of terrain perception. This paper presents a robust reinforcement learning-based exteroceptive locomotion controller for resource-constrained small-scale quadrupeds in challenging terrains, which exploits real-time elevation mapping, supported by a careful depth sensor selection. We concurrently train both a policy and a state estimator, which together provide an odometry source for elevation mapping, optionally fused with visual-inertial odometry (VIO). We demonstrate the importance of positioning an additional time-of-flight sensor for maintaining robustness even without VIO, thus having the potential to free up computational resources. We experimentally demonstrate that the proposed controller can flawlessly traverse steps up to 17.5 cm in height and achieve an 80% success rate on 22.5 cm steps, both with and without VIO. The proposed controller also achieves accurate forward and yaw velocity tracking of up to 1.0 m/s and 1.5 rad/s respectively. We open-source our training code at github.com/ETH-PBL/elmap-rl-controller.
comment: This paper has been accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA), Atlanta 2025. The code is available at github.com/ETH-PBL/elmap-rl-controller
Development of a non-wearable support robot capable of reproducing natural standing-up movements
To reproduce natural standing-up motion, recent studies have emphasized the importance of coordination between the assisting robot and the human. However, many non-wearable assistive devices have struggled to replicate natural motion trajectories. While wearable devices offer better coordination with the human body, they present challenges in completely isolating mechanical and electrical hazards. To address this, we developed a novel standing-assist robot that integrates features of both wearable and non-wearable systems, aiming to achieve high coordination while maintaining safety. The device employs a four-link mechanism aligned with the human joint structure, designed to reproduce the S-shaped trajectory of the hip and the arc trajectory of the knee during natural standing-up motion. Subject-specific trajectory data were obtained using a gyroscope, and the link lengths were determined to drive the seat along the optimal path. A feedforward speed control using a stepping motor was implemented, and the reproducibility of the trajectory was evaluated based on the geometric constraints of the mechanism. A load-bearing experiment with weights fixed to the seat was conducted to assess the trajectory accuracy under different conditions. Results showed that the reproduction errors for the hip and knee trajectories remained within approximately 4 percent of the seat's total displacement, demonstrating high fidelity to the target paths. In addition, durability testing, thermal safety evaluation, and risk assessment confirmed the reliability and safety of the system for indoor use. These findings suggest that the proposed design offers a promising approach for developing assistive technologies that adapt to individual physical characteristics, with potential applications in elderly care and rehabilitation.
Event-Driven Simulation for Rapid Iterative Development of Distributed Space Flight Software
This paper presents the design, development, and application of a novel space simulation environment for rapidly prototyping and testing flight software for distributed space systems. The environment combines the flexibility, determinism, and observability of software-only simulation with the fidelity and depth normally attained only by real-time hardware-in-the-loop testing. Ultimately, this work enables an engineering process in which flight software is continuously improved and delivered in its final, flight-ready form, and which reduces the cost of design changes and software revisions with respect to a traditional linear development process. Three key methods not found in existing tools enable this environment's novel capabilities: first, a hybrid event-driven simulation architecture that combines continuous-time and discrete-event simulation paradigms; second, a lightweight application-layer software virtualization design that allows executing compiled flight software binaries while modeling process scheduling, input/output, and memory use; and third, high-fidelity models for the multi-spacecraft space environment, including for wireless communication, relative sensing such as differential GPS and cameras, and flight computer health metrics like heap exhaustion and fragmentation. The simulation environment's capabilities are applied to the iterative development and testing of two flight-ready software packages: the guidance, navigation, and control software for the VISORS mission, and the Stanford Space Rendezvous Laboratory software kit for rendezvous and proximity operations. Results from 33 months of flight software development demonstrate the use of this simulation environment to rapidly and reliably identify and resolve defects, characterize navigation and control performance, and scrutinize implementation details like memory allocation and inter-spacecraft network protocols.
comment: IEEE Aerospace Conference 2025
A Robot Simulation Environment for Virtual Reality Enhanced Underwater Manipulation and Seabed Intervention Tasks ICRA
This paper presents the (MARUN)2 underwater robotic simulator. The simulator architecture enables seamless integration with the ROS-based mission software and web-based user interface of URSULA, a squid inspired biomimetic robot designed for dexterous underwater manipulation and seabed intervention tasks. (MARUN)2 utilizes the Unity game engine for physics-based rigid body dynamic simulation and underwater environment modeling. Utilizing Unity as the simulation environment enables the integration of virtual reality and haptic feedback capabilities for a more immersive and realistic experience for improved operator dexterity and experience. The utility of the simulator and improved dexterity provided by the VR module is validated through user experiments.
comment: 7 pages with 8 figures; presented in the AQ2UASIM: Advancing Quantitative and Qualitative SIMulators for Marine Applications workshop held as a part of the IEEE International Conference on Robotics and Automation (ICRA) 2025
BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
Multimodal large language models (MLLMs) have recently gained attention for their generalization and reasoning capabilities in Vision-and-Language Navigation (VLN) tasks, leading to the rise of MLLM-driven navigators. However, MLLMs are vulnerable to jailbreak attacks, where crafted prompts bypass safety mechanisms and trigger undesired outputs. In embodied scenarios, such vulnerabilities pose greater risks: unlike plain text models that generate toxic content, embodied agents may interpret malicious instructions as executable commands, potentially leading to real-world harm. In this paper, we present the first systematic jailbreak attack paradigm targeting MLLM-driven navigator. We propose a three-tiered attack framework and construct malicious queries across four intent categories, concatenated with standard navigation instructions. In the Matterport3D simulator, we evaluate navigation agents powered by five MLLMs and report an average attack success rate over 90%. To test real-world feasibility, we replicate the attack on a physical robot. Our results show that even well-crafted prompts can induce harmful actions and intents in MLLMs, posing risks beyond toxic output and potentially leading to physical harm.
comment: 8 pages, 4 figures
Depth Transfer: Learning to See Like a Simulator for Real-World Drone Navigation
Sim-to-real transfer is a fundamental challenge in robot reinforcement learning. Discrepancies between simulation and reality can significantly impair policy performance, especially if it receives high-dimensional inputs such as dense depth estimates from vision. We propose a novel depth transfer method based on domain adaptation to bridge the visual gap between simulated and real-world depth data. A Variational Autoencoder (VAE) is first trained to encode ground-truth depth images from simulation into a latent space, which serves as input to a reinforcement learning (RL) policy. During deployment, the encoder is refined to align stereo depth images with this latent space, enabling direct policy transfer without fine-tuning. We apply our method to the task of autonomous drone navigation through cluttered environments. Experiments in IsaacGym show that our method nearly doubles the obstacle avoidance success rate when switching from ground-truth to stereo depth input. Furthermore, we demonstrate successful transfer to the photo-realistic simulator AvoidBench using only IsaacGym-generated stereo data, achieving superior performance compared to state-of-the-art baselines. Real-world evaluations in both indoor and outdoor environments confirm the effectiveness of our approach, enabling robust and generalizable depth-based navigation across diverse domains.
MTIL: Encoding Full History with Mamba for Temporal Imitation Learning
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, limiting their applicability to tasks where historical context is crucial for disambiguating current observations. This limitation hinders performance in long-horizon sequential manipulation tasks where the correct action depends on past events not fully captured by the current state. To address this fundamental challenge, we introduce Mamba Temporal Imitation Learning (MTIL), a novel approach that leverages the recurrent state dynamics inherent in State Space Models (SSMs), specifically the Mamba architecture. MTIL encodes the entire trajectory history into a compressed hidden state, conditioning action predictions on this comprehensive temporal context alongside current multi-modal observations. Through extensive experiments on simulated benchmarks (ACT dataset tasks, Robomimic, LIBERO) and real-world sequential manipulation tasks specifically designed to probe temporal dependencies, MTIL significantly outperforms state-of-the-art methods like ACT and Diffusion Policy. Our findings affirm the necessity of full temporal context for robust sequential decision-making and validate MTIL as a powerful approach that transcends the inherent limitations of Markovian imitation learning
comment: 16 pages,6 figures,Submitted to IEEE RAL
Is Semantic SLAM Ready for Embedded Systems ? A Comparative Survey
In embedded systems, robots must perceive and interpret their environment efficiently to operate reliably in real-world conditions. Visual Semantic SLAM (Simultaneous Localization and Mapping) enhances standard SLAM by incorporating semantic information into the map, enabling more informed decision-making. However, implementing such systems on resource-limited hardware involves trade-offs between accuracy, computing efficiency, and power usage. This paper provides a comparative review of recent Semantic Visual SLAM methods with a focus on their applicability to embedded platforms. We analyze three main types of architectures - Geometric SLAM, Neural Radiance Fields (NeRF), and 3D Gaussian Splatting - and evaluate their performance on constrained hardware, specifically the NVIDIA Jetson AGX Orin. We compare their accuracy, segmentation quality, memory usage, and energy consumption. Our results show that methods based on NeRF and Gaussian Splatting achieve high semantic detail but demand substantial computing resources, limiting their use on embedded devices. In contrast, Semantic Geometric SLAM offers a more practical balance between computational cost and accuracy. The review highlights a need for SLAM algorithms that are better adapted to embedded environments, and it discusses key directions for improving their efficiency through algorithm-hardware co-design.
Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts
While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, visuospatial cognition - reasoning about spatial layouts, relations, and dynamics - remains a significant challenge. Existing models often lack the necessary architectural components and specialized training data for fine-grained spatial understanding. We introduce ViCA2 (Visuospatial Cognitive Assistant 2), a novel MLLM designed to enhance spatial reasoning. ViCA2 features a dual vision encoder architecture integrating SigLIP for semantics and Hiera for spatial structure, coupled with a token ratio control mechanism for efficiency. We also developed ViCA-322K, a new large-scale dataset with over 322,000 spatially grounded question-answer pairs for targeted instruction tuning. On the challenging VSI-Bench benchmark, our ViCA2-7B model achieves a state-of-the-art average score of 56.8, significantly surpassing larger open-source models (e.g., LLaVA-NeXT-Video-72B, 40.9) and leading proprietary models (Gemini-1.5 Pro, 45.4). This demonstrates the effectiveness of our approach in achieving strong visuospatial intelligence with a compact model. We release ViCA2, its codebase, and the ViCA-322K dataset to facilitate further research.
comment: 26 pages, 19 figures, 4 tables. Code, models, and dataset are available at our project page: https://github.com/nkkbr/ViCA
Adaptive MPC-based quadrupedal robot control under periodic disturbances
Recent advancements in adaptive control for reference trajectory tracking enable quadrupedal robots to perform locomotion tasks under challenging conditions. There are methods enabling the estimation of the external disturbances in terms of forces and torques. However, a specific case of disturbances that are periodic was not explicitly tackled in application to quadrupeds. This work is devoted to the estimation of the periodic disturbances with a lightweight regressor using simplified robot dynamics and extracting the disturbance properties in terms of the magnitude and frequency. Experimental evidence suggests performance improvement over the baseline static disturbance compensation. All source files, including simulation setups, code, and calculation scripts, are available on GitHub at https://github.com/aidagroup/quad-periodic-mpc.
A universal policy wrapper with guarantees
We introduce a universal policy wrapper for reinforcement learning agents that ensures formal goal-reaching guarantees. In contrast to standard reinforcement learning algorithms that excel in performance but lack rigorous safety assurances, our wrapper selectively switches between a high-performing base policy -- derived from any existing RL method -- and a fallback policy with known convergence properties. Base policy's value function supervises this switching process, determining when the fallback policy should override the base policy to ensure the system remains on a stable path. The analysis proves that our wrapper inherits the fallback policy's goal-reaching guarantees while preserving or improving upon the performance of the base policy. Notably, it operates without needing additional system knowledge or online constrained optimization, making it readily deployable across diverse reinforcement learning architectures and tasks.
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a specified goal set with known probability and provide precise bounds on maximum deviation and convergence time. Empirical validation on control tasks demonstrates enhanced performance while maintaining stability guarantees.
Structureless VIO
Visual odometry (VO) is typically considered as a chicken-and-egg problem, as the localization and mapping modules are tightly-coupled. The estimation of visual map relies on accurate localization information. Meanwhile, localization requires precise map points to provide motion constraints. This classical design principle is naturally inherited by visual-inertial odometry (VIO). Efficient localization solution that does not require a map has not been fully investigated. To this end, we propose a novel structureless VIO, where the visual map is removed from the odometry framework. Experimental results demonstrated that, compared to the structure-based VIO baseline, our structureless VIO not only substantially improves computational efficiency but also has advantages in accuracy.
Robust Planning for Autonomous Driving via Mixed Adversarial Diffusion Predictions ICRA
We describe a robust planning method for autonomous driving that mixes normal and adversarial agent predictions output by a diffusion model trained for motion prediction. We first train a diffusion model to learn an unbiased distribution of normal agent behaviors. We then generate a distribution of adversarial predictions by biasing the diffusion model at test time to generate predictions that are likely to collide with a candidate plan. We score plans using expected cost with respect to a mixture distribution of normal and adversarial predictions, leading to a planner that is robust against adversarial behaviors but not overly conservative when agents behave normally. Unlike current approaches, we do not use risk measures that over-weight adversarial behaviors while placing little to no weight on low-cost normal behaviors or use hard safety constraints that may not be appropriate for all driving scenarios. We show the effectiveness of our method on single-agent and multi-agent jaywalking scenarios as well as a red light violation scenario.
comment: IEEE International Conference on Robotics and Automation (ICRA) 2025
Visuospatial Cognitive Assistant
Video-based spatial cognition is vital for robotics and embodied AI but challenges current Vision-Language Models (VLMs). This paper makes two key contributions. First, we introduce ViCA (Visuospatial Cognitive Assistant)-322K, a diverse dataset of 322,003 QA pairs from real-world indoor videos (ARKitScenes, ScanNet, ScanNet++), offering supervision for 3D metadata-grounded queries and video-based complex reasoning. Second, we develop ViCA-7B, fine-tuned on ViCA-322K, which achieves new state-of-the-art on all eight VSI-Bench tasks, outperforming existing models, including larger ones (e.g., +26.1 on Absolute Distance). For interpretability, we present ViCA-Thinking-2.68K, a dataset with explicit reasoning chains, and fine-tune ViCA-7B to create ViCA-7B-Thinking, a model that articulates its spatial reasoning. Our work highlights the importance of targeted data and suggests paths for improved temporal-spatial modeling. We release all resources to foster research in robust visuospatial intelligence.
comment: 31 pages, 10 figures, 6 tables. The implementation and fine-tuned model (ViCA-7B) are publicly available at https://huggingface.co/nkkbr/ViCA. The ViCA-322K dataset can be found at https://huggingface.co/datasets/nkkbr/ViCA-322K, and the ViCA-Thinking-2.68K dataset is at https://huggingface.co/datasets/nkkbr/ViCA-thinking-2.68k
Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization
Despite over a decade of development, autonomous driving trajectory planning in complex urban environments continues to encounter significant challenges. These challenges include the difficulty in accommodating the multi-modal nature of trajectories, the limitations of single expert in managing diverse scenarios, and insufficient consideration of environmental interactions. To address these issues, this paper introduces the EMoE-Planner, which incorporates three innovative approaches. Firstly, the Explicit MoE (Mixture of Experts) dynamically selects specialized experts based on scenario-specific information through a shared scene router. Secondly, the planner utilizes scene-specific queries to provide multi-modal priors, directing the model's focus towards relevant target areas. Lastly, it enhances the prediction model and loss calculation by considering the interactions between the ego vehicle and other agents, thereby significantly boosting planning performance. Comparative experiments were conducted using the Nuplan dataset against the state-of-the-art methods. The simulation results demonstrate that our model consistently outperforms SOTA models across nearly all test scenarios.
comment: Main text 9 pages, appendices 2 pages with 7 figures
DNOI-4DRO: Deep 4D Radar Odometry with Differentiable Neural-Optimization Iterations
A novel learning-optimization-combined 4D radar odometry model, named DNOI-4DRO, is proposed in this paper. The proposed model seamlessly integrates traditional geometric optimization with end-to-end neural network training, leveraging an innovative differentiable neural-optimization iteration operator. In this framework, point-wise motion flow is first estimated using a neural network, followed by the construction of a cost function based on the relationship between point motion and pose in 3D space. The radar pose is then refined using Gauss-Newton updates. Additionally, we design a dual-stream 4D radar backbone that integrates multi-scale geometric features and clustering-based class-aware features to enhance the representation of sparse 4D radar point clouds. Extensive experiments on the VoD and Snail-Radar datasets demonstrate the superior performance of our model, which outperforms recent classical and learning-based approaches. Notably, our method even achieves results comparable to A-LOAM with mapping optimization using LiDAR point clouds as input. Our models and code will be publicly released.
comment: 16 pages,10 figures
PartDexTOG: Generating Dexterous Task-Oriented Grasping via Language-driven Part Analysis
Task-oriented grasping is a crucial yet challenging task in robotic manipulation. Despite the recent progress, few existing methods address task-oriented grasping with dexterous hands. Dexterous hands provide better precision and versatility, enabling robots to perform task-oriented grasping more effectively. In this paper, we argue that part analysis can enhance dexterous grasping by providing detailed information about the object's functionality. We propose PartDexTOG, a method that generates dexterous task-oriented grasps via language-driven part analysis. Taking a 3D object and a manipulation task represented by language as input, the method first generates the category-level and part-level grasp descriptions w.r.t the manipulation task by LLMs. Then, a category-part conditional diffusion model is developed to generate a dexterous grasp for each part, respectively, based on the generated descriptions. To select the most plausible combination of grasp and corresponding part from the generated ones, we propose a measure of geometric consistency between grasp and part. We show that our method greatly benefits from the open-world knowledge reasoning on object parts by LLMs, which naturally facilitates the learning of grasp generation on objects with different geometry and for different manipulation tasks. Our method ranks top on the OakInk-shape dataset over all previous methods, improving the Penetration Volume, the Grasp Displace, and the P-FID over the state-of-the-art by $3.58\%$, $2.87\%$, and $41.43\%$, respectively. Notably, it demonstrates good generality in handling novel categories and tasks.
Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning
Human behavior is fundamentally shaped by visual perception -- our ability to interact with the world depends on actively gathering relevant information and adapting our movements accordingly. Behaviors like searching for objects, reaching, and hand-eye coordination naturally emerge from the structure of our sensory system. Inspired by these principles, we introduce Perceptive Dexterous Control (PDC), a framework for vision-driven dexterous whole-body control with simulated humanoids. PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues, without relying on privileged state information (e.g., 3D object positions and geometries). This perception-as-interface paradigm enables learning a single policy to perform multiple household tasks, including reaching, grasping, placing, and articulated object manipulation. We also show that training from scratch with reinforcement learning can produce emergent behaviors such as active search. These results demonstrate how vision-driven control and complex tasks induce human-like behaviors and can serve as the key ingredients in closing the perception-action loop for animation, robotics, and embodied AI.
comment: Project page: https://zhengyiluo.github.io/PDC
Real-Time Spatial Reasoning by Mobile Robots for Reconstruction and Navigation in Dynamic LiDAR Scenes
Our brain has an inner global positioning system which enables us to sense and navigate 3D spaces in real time. Can mobile robots replicate such a biological feat in a dynamic environment? We introduce the first spatial reasoning framework for real-time surface reconstruction and navigation that is designed for outdoor LiDAR scanning data captured by ground mobile robots and capable of handling moving objects such as pedestrians. Our reconstruction-based approach is well aligned with the critical cellular functions performed by the border vector cells (BVCs) over all layers of the medial entorhinal cortex (MEC) for surface sensing and tracking. To address the challenges arising from blurred boundaries resulting from sparse single-frame LiDAR points and outdated data due to object movements, we integrate real-time single-frame mesh reconstruction, via visibility reasoning, with robot navigation assistance through on-the-fly 3D free space determination. This enables continuous and incremental updates of the scene and free space across multiple frames. Key to our method is the utilization of line-of-sight (LoS) vectors from LiDAR, which enable real-time surface normal estimation, as well as robust and instantaneous per-voxel free space updates. We showcase two practical applications: real-time 3D scene reconstruction and autonomous outdoor robot navigation in real-world conditions. Comprehensive experiments on both synthetic and real scenes highlight our method's superiority in speed and quality over existing real-time LiDAR processing approaches.
Design of a 3-DOF Hopping Robot with an Optimized Gearbox: An Intermediate Platform Toward Bipedal Robots
This paper presents a 3-DOF hopping robot with a human-like lower-limb joint configuration and a flat foot, capable of performing dynamic and repetitive jumping motions. To achieve both high torque output and a large hollow shaft diameter for efficient cable routing, a compact 3K compound planetary gearbox was designed using mixed-integer nonlinear programming for gear tooth optimization. To meet performance requirements within the constrained joint geometry, all major components-including the actuator, motor driver, and communication interface-were custom-designed. The robot weighs 12.45 kg, including a dummy mass, and measures 840 mm in length when the knee joint is fully extended. A reinforcement learning-based controller was employed, and robot's performance was validated through hardware experiments, demonstrating stable and repetitive hopping motions in response to user inputs. These experimental results indicate that the platform serves as a solid foundation for future bipedal robot development.
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction
Vision-Language-Action (VLA) models have recently advanced robotic manipulation by translating natural-language instructions and image information into sequential control actions. However, these models often underperform in open-world scenarios, as they are predominantly trained on successful expert demonstrations and exhibit a limited capacity for failure recovery. In this work, we present a Robotic Failure Analysis and Correction (RoboFAC) framework to address this issue. Firstly, we construct RoboFAC dataset comprising 9,440 erroneous manipulation trajectories and 78,623 QA pairs across 16 diverse tasks and 53 scenes in both simulation and real-world environments. Leveraging our dataset, we develop RoboFAC model, which is capable of Task Understanding, Failure Analysis and Failure Correction. Experimental results demonstrate that the RoboFAC model outperforms GPT-4o by 34.1% on our evaluation benchmark. Furthermore, we integrate the RoboFAC model into a real-world VLA control pipeline as an external supervision providing correction instructions, yielding a 29.1% relative improvement on average on four real-world tasks. The results show that our RoboFAC framework effectively handles robotic failures and assists the VLA model in recovering from failures.
Learning Impact-Rich Rotational Maneuvers via Centroidal Velocity Rewards and Sim-to-Real Techniques: A One-Leg Hopper Flip Case Study
Dynamic rotational maneuvers, such as front flips, inherently involve large angular momentum generation and intense impact forces, presenting major challenges for reinforcement learning and sim-to-real transfer. In this work, we propose a general framework for learning and deploying impact-rich, rotation-intensive behaviors through centroidal velocity-based rewards and actuator-aware sim-to-real techniques. We identify that conventional link-level reward formulations fail to induce true whole-body rotation and introduce a centroidal angular velocity reward that accurately captures system-wide rotational dynamics. To bridge the sim-to-real gap under extreme conditions, we model motor operating regions (MOR) and apply transmission load regularization to ensure realistic torque commands and mechanical robustness. Using the one-leg hopper front flip as a representative case study, we demonstrate the first successful hardware realization of a full front flip. Our results highlight that incorporating centroidal dynamics and actuator constraints is critical for reliably executing highly dynamic motions.
Behavior Synthesis via Contact-Aware Fisher Information Maximization
Contact dynamics hold immense amounts of information that can improve a robot's ability to characterize and learn about objects in their environment through interactions. However, collecting information-rich contact data is challenging due to its inherent sparsity and non-smooth nature, requiring an active approach to maximize the utility of contacts for learning. In this work, we investigate an optimal experimental design approach to synthesize robot behaviors that produce contact-rich data for learning. Our approach derives a contact-aware Fisher information measure that characterizes information-rich contact behaviors that improve parameter learning. We observe emergent robot behaviors that are able to excite contact interactions that efficiently learns object parameters across a range of parameter learning examples. Last, we demonstrate the utility of contact-awareness for learning parameters through contact-seeking behaviors on several robotic experiments.
comment: In Robotics Science and Systems 2025
Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding
Multimodal large language models (MLLMs) have demonstrated remarkable abilities in comprehending visual input alongside text input. Typically, these models are trained on extensive data sourced from the internet, which are sufficient for general tasks such as scene understanding and question answering. However, they often underperform on specialized tasks where online data is scarce, such as determining spatial relationships between objects or localizing unique target objects within a group of objects sharing similar features. In response to this challenge, we introduce the SUN-Spot v2.0 dataset1, now comprising a total of 90k image-caption pairs and additional annotations on the landmark objects. Each image-caption pair utilizes Set-of-Marks prompting as an additional indicator, mapping each landmark object in the image to the corresponding object mentioned in the caption. Furthermore, we present Spatial-LLaVA, an MLLM trained on conversational data generated by a state-of-the-art language model using the SUNSpot v2.0 dataset. Our approach ensures a robust alignment between the objects in the images and their corresponding object mentions in the captions, enabling our model to learn spatial referring expressions without bias from the semantic information of the objects. Spatial-LLaVA outperforms previous methods by 3.15% on the zero-shot Visual Spatial Reasoning benchmark dataset. Spatial-LLaVA is specifically designed to precisely understand spatial referring expressions, making it highly applicable for tasks in real-world scenarios such as autonomous navigation and interactive robotics, where precise object recognition is critical.
Towards Robust Autonomous Landing Systems: Iterative Solutions and Key Lessons Learned
Uncrewed Aerial Vehicles (UAVs) have become a focal point of research, with both established companies and startups investing heavily in their development. This paper presents our iterative process in developing a robust autonomous marker-based landing system, highlighting the key challenges encountered and the solutions implemented. It reviews existing systems for autonomous landing processes, and through this aims to contribute to the community by sharing insights and challenges faced during development and testing.
Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios
Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios demonstrate that DSAC-H achieves efficient driving performance with near-zero safety constraint violations.
comment: IEEE Intelligent Vehicles Symposium (IV 2025)
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset CVPR 2025
We introduce the Digital Twin Catalog (DTC), a new large-scale photorealistic 3D object digital twin dataset. A digital twin of a 3D object is a highly detailed, virtually indistinguishable representation of a physical object, accurately capturing its shape, appearance, physical properties, and other attributes. Recent advances in neural-based 3D reconstruction and inverse rendering have significantly improved the quality of 3D object reconstruction. Despite these advancements, there remains a lack of a large-scale, digital twin-quality real-world dataset and benchmark that can quantitatively assess and compare the performance of different reconstruction methods, as well as improve reconstruction quality through training or fine-tuning. Moreover, to democratize 3D digital twin creation, it is essential to integrate creation techniques with next-generation egocentric computing platforms, such as AR glasses. Currently, there is no dataset available to evaluate 3D object reconstruction using egocentric captured images. To address these gaps, the DTC dataset features 2,000 scanned digital twin-quality 3D objects, along with image sequences captured under different lighting conditions using DSLR cameras and egocentric AR glasses. This dataset establishes the first comprehensive real-world evaluation benchmark for 3D digital twin creation tasks, offering a robust foundation for comparing and improving existing reconstruction methods. The DTC dataset is already released at https://www.projectaria.com/datasets/dtc/ and we will also make the baseline evaluations open-source.
comment: accepted to CVPR 2025 (Highlight). Dataset page: https://www.projectaria.com/datasets/dtc/
Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data
Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two essential techniques: \emph{i)} experience rehearsal and \emph{ii)} execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves a 102.8\% relative improvement in aggregate score over learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.
Continuously Optimizing Radar Placement with Model Predictive Path Integrals
Continuously optimizing sensor placement is essential for precise target localization in various military and civilian applications. While information theory has shown promise in optimizing sensor placement, many studies oversimplify sensor measurement models or neglect dynamic constraints of mobile sensors. To address these challenges, we employ a range measurement model that incorporates radar parameters and radar-target distance, coupled with Model Predictive Path Integral (MPPI) control to manage complex environmental obstacles and dynamic constraints. We compare the proposed approach against stationary radars or simplified range measurement models based on the root mean squared error (RMSE) of the Cubature Kalman Filter (CKF) estimator for the targets' state. Additionally, we visualize the evolving geometry of radars and targets over time, highlighting areas of highest measurement information gain, demonstrating the strengths of the approach. The proposed strategy outperforms stationary radars and simplified range measurement models in target localization, achieving a 38-74% reduction in mean RMSE and a 33-79% reduction in the upper tail of the 90% Highest Density Interval (HDI) over 500 Monte Carl (MC) trials across all time steps. Code will be made publicly available upon acceptance.
comment: Accepted to IEEE Aerospace and Electronic Systems
DYNUS: Uncertainty-aware Trajectory Planner in Dynamic Unknown Environments
This paper introduces DYNUS, an uncertainty-aware trajectory planner designed for dynamic unknown environments. Operating in such settings presents many challenges -- most notably, because the agent cannot predict the ground-truth future paths of obstacles, a previously planned trajectory can become unsafe at any moment, requiring rapid replanning to avoid collisions. Recently developed planners have used soft-constraint approaches to achieve the necessary fast computation times; however, these methods do not guarantee collision-free paths even with static obstacles. In contrast, hard-constraint methods ensure collision-free safety, but typically have longer computation times. To address these issues, we propose three key contributions. First, the DYNUS Global Planner (DGP) and Temporal Safe Corridor Generation operate in spatio-temporal space and handle both static and dynamic obstacles in the 3D environment. Second, the Safe Planning Framework leverages a combination of exploratory, safe, and contingency trajectories to flexibly re-route when potential future collisions with dynamic obstacles are detected. Finally, the Fast Hard-Constraint Local Trajectory Formulation uses a variable elimination approach to reduce the problem size and enable faster computation by pre-computing dependencies between free and dependent variables while still ensuring collision-free trajectories. We evaluated DYNUS in a variety of simulations, including dense forests, confined office spaces, cave systems, and dynamic environments. Our experiments show that DYNUS achieves a success rate of 100% and travel times that are approximately 25.0% faster than state-of-the-art methods. We also evaluated DYNUS on multiple platforms -- a quadrotor, a wheeled robot, and a quadruped -- in both simulation and hardware experiments.
comment: 20 pages, 30 figures, Under review at IEEE Transactions on Robotics
IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity
Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths when compared to state-of-the-art baselines, and lastly deploy our learned policy on hardware. Our simulation training and testing code is available at https://ir2-explore.github.io.
comment: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects ICRA2025
Accurate modelling of object deformations is crucial for a wide range of robotic manipulation tasks, where interacting with soft or deformable objects is essential. Current methods struggle to generalise to unseen forces or adapt to new objects, limiting their utility in real-world applications. We propose Shape-Space Deformer, a unified representation for encoding a diverse range of object deformations using template augmentation to achieve robust, fine-grained reconstructions that are resilient to outliers and unwanted artefacts. Our method improves generalization to unseen forces and can rapidly adapt to novel objects, significantly outperforming existing approaches. We perform extensive experiments to test a range of force generalisation settings and evaluate our method's ability to reconstruct unseen deformations, demonstrating significant improvements in reconstruction accuracy and robustness. Our approach is suitable for real-time performance, making it ready for downstream manipulation applications.
comment: Accepted in ICRA2025
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Recent advances in creative AI have enabled the synthesis of high-fidelity images and videos conditioned on language instructions. Building on these developments, text-to-video diffusion models have evolved into embodied world models (EWMs) capable of generating physically plausible scenes from language commands, effectively bridging vision and action in embodied AI applications. This work addresses the critical challenge of evaluating EWMs beyond general perceptual metrics to ensure the generation of physically grounded and action-consistent behaviors. We propose the Embodied World Model Benchmark (EWMBench), a dedicated framework designed to evaluate EWMs based on three key aspects: visual scene consistency, motion correctness, and semantic alignment. Our approach leverages a meticulously curated dataset encompassing diverse scenes and motion patterns, alongside a comprehensive multi-dimensional evaluation toolkit, to assess and compare candidate models. The proposed benchmark not only identifies the limitations of existing video generation models in meeting the unique requirements of embodied tasks but also provides valuable insights to guide future advancements in the field. The dataset and evaluation tools are publicly available at https://github.com/AgibotTech/EWMBench.
comment: Website: https://github.com/AgibotTech/EWMBench
Omnigrasp: Grasping Diverse Objects with Simulated Humanoids
We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object's trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. To close this gap, we learn a controller that can pick up a large number (>1200) of objects and carry them to follow randomly generated trajectories. Our key insight is to leverage a humanoid motion representation that provides human-like motor skills and significantly speeds up training. Using only simplistic reward, state, and object representations, our method shows favorable scalability on diverse objects and trajectories. For training, we do not need a dataset of paired full-body motion and object trajectories. At test time, we only require the object mesh and desired trajectories for grasping and transporting. To demonstrate the capabilities of our method, we show state-of-the-art success rates in following object trajectories and generalizing to unseen objects. Code and models will be released.
comment: Project page: https://www.zhengyiluo.com/Omnigrasp/
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
Humanoid robots, capable of assuming human roles in various workplaces, have become essential to embodied intelligence. However, as robots with complex physical structures, learning a control model that can operate robustly across diverse environments remains inherently challenging, particularly under the discrepancies between training and deployment environments. In this study, we propose HWC-Loco, a robust whole-body control algorithm tailored for humanoid locomotion tasks. By reformulating policy learning as a robust optimization problem, HWC-Loco explicitly learns to recover from safety-critical scenarios. While prioritizing safety guarantees, overly conservative behavior can compromise the robot's ability to complete the given tasks. To tackle this challenge, HWC-Loco leverages a hierarchical policy for robust control. This policy can dynamically resolve the trade-off between goal-tracking and safety recovery, guided by human behavior norms and dynamic constraints. To evaluate the performance of HWC-Loco, we conduct extensive comparisons against state-of-the-art humanoid control models, demonstrating HWC-Loco's superior performance across diverse terrains, robot structures, and locomotion tasks under both simulated and real-world environments.
This&That: Language-Gesture Controlled Video Generation for Robot Planning
Clear, interpretable instructions are invaluable when attempting any complex task. Good instructions help to clarify the task and even anticipate the steps needed to solve it. In this work, we propose a robot learning framework for communicating, planning, and executing a wide range of tasks, dubbed This&That. This&That solves general tasks by leveraging video generative models, which, through training on internet-scale data, contain rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communication with simple human instructions, 2) controllable video generation that respects user intent, and 3) translating visual plans into robot actions. This&That uses language-gesture conditioning to generate video predictions, as a succinct and unambiguous alternative to existing language-only methods, especially in complex and uncertain environments. These video predictions are then fed into a behavior cloning architecture dubbed Diffusion Video to Action (DiVA), which outperforms prior state-of-the-art behavior cloning and video-based planning methods by substantial margins.
Hand Dominance and Congruence for Wrist-worn Haptics using Custom Voice-Coil Actuation
During virtual interactions, rendering haptic feedback on a remote location (like the wrist) instead of the fingertips freeing users' hands from mechanical devices. This allows for real interactions while still providing information regarding the mechanical properties of virtual objects. In this paper, we present CoWrHap -- a novel wrist-worn haptic device with custom-made voice coil actuation to render force feedback. Then, we investigate the impact of asking participants to use their dominant or non-dominant hand for virtual interactions and the best mapping between the active hand and the wrist receiving the haptic feedback, which can be defined as hand-wrist congruence through a user experiment based on a stiffness discrimination task. Our results show that participants performed the tasks (i) better with non-congruent mapping but reported better experiences with congruent mapping, and (ii) with no statistical difference in terms of hand dominance but reported better user experience and enjoyment using their dominant hands. This study indicates that participants can perceive mechanical properties via haptic feedback provided through CoWrHap.
comment: 6 pages, 7 figures
Deploying Ten Thousand Robots: Scalable Imitation Learning for Lifelong Multi-Agent Path Finding ICRA 2025
Lifelong Multi-Agent Path Finding (LMAPF) repeatedly finds collision-free paths for multiple agents that are continually assigned new goals when they reach current ones. Recently, this field has embraced learning-based methods, which reactively generate single-step actions based on individual local observations. However, it is still challenging for them to match the performance of the best search-based algorithms, especially in large-scale settings. This work proposes an imitation-learning-based LMAPF solver that introduces a novel communication module as well as systematic single-step collision resolution and global guidance techniques. Our proposed solver, Scalable Imitation Learning for LMAPF (SILLM), inherits the fast reasoning speed of learning-based methods and the high solution quality of search-based methods with the help of modern GPUs. Across six large-scale maps with up to 10,000 agents and varying obstacle structures, SILLM surpasses the best learning- and search-based baselines, achieving average throughput improvements of 137.7% and 16.0%, respectively. Furthermore, SILLM also beats the winning solution of the 2023 League of Robot Runners, an international LMAPF competition. Finally, we validated SILLM with 10 real robots and 100 virtual robots in a mock warehouse environment.
comment: Accepted by ICRA 2025
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings
We propose PseudoNeg-MAE, a novel self-supervised learning framework that enhances global feature representation of point cloud masked autoencoder by making them both discriminative and sensitive to transformations. Traditional contrastive learning methods focus on achieving invariance, discarding transformation-specific information. Recent approaches incorporate transformation sensitivity by explicitly modeling relationships between original and transformed inputs. However, they report an invariant-collapse phenomenon, where the predictor degenerates into identity mappings, resulting in latent representations that have limited variation across transformations. We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations. PseudoNeg-MAE uses a parametric network COPE, which learns the localized displacements caused by transformations within the latent space. However, jointly training COPE with the MAE leads to undesirable trivial solutions where COPE outputs collapse to an identity. To address this, we propose a loss that uses transformation-conditioned pseudo-negatives, to penalize such trivial invariant solutions. We validate PseudoNeg-MAE on shape classification and relative pose estimation tasks, where it achieves competitive performance on the ModelNet40 and ScanObjectNN datasets under challenging evaluation protocols and demonstrates superior accuracy in estimating relative poses compared to supervised methods.
Multiagent Systems
Event-Driven Simulation for Rapid Iterative Development of Distributed Space Flight Software
This paper presents the design, development, and application of a novel space simulation environment for rapidly prototyping and testing flight software for distributed space systems. The environment combines the flexibility, determinism, and observability of software-only simulation with the fidelity and depth normally attained only by real-time hardware-in-the-loop testing. Ultimately, this work enables an engineering process in which flight software is continuously improved and delivered in its final, flight-ready form, and which reduces the cost of design changes and software revisions with respect to a traditional linear development process. Three key methods not found in existing tools enable this environment's novel capabilities: first, a hybrid event-driven simulation architecture that combines continuous-time and discrete-event simulation paradigms; second, a lightweight application-layer software virtualization design that allows executing compiled flight software binaries while modeling process scheduling, input/output, and memory use; and third, high-fidelity models for the multi-spacecraft space environment, including for wireless communication, relative sensing such as differential GPS and cameras, and flight computer health metrics like heap exhaustion and fragmentation. The simulation environment's capabilities are applied to the iterative development and testing of two flight-ready software packages: the guidance, navigation, and control software for the VISORS mission, and the Stanford Space Rendezvous Laboratory software kit for rendezvous and proximity operations. Results from 33 months of flight software development demonstrate the use of this simulation environment to rapidly and reliably identify and resolve defects, characterize navigation and control performance, and scrutinize implementation details like memory allocation and inter-spacecraft network protocols.
comment: IEEE Aerospace Conference 2025
Beyond Frameworks: Unpacking Collaboration Strategies in Multi-Agent Systems ACL 2025
Multi-agent collaboration has emerged as a pivotal paradigm for addressing complex, distributed tasks in large language model (LLM)-driven applications. While prior research has focused on high-level architectural frameworks, the granular mechanisms governing agents, critical to performance and scalability, remain underexplored. This study systematically investigates four dimensions of collaboration strategies: (1) agent governance, (2) participation control, (3) interaction dynamics, and (4) dialogue history management. Through rigorous experimentation under two context-dependent scenarios: Distributed Evidence Integration (DEI) and Structured Evidence Synthesis (SES), we quantify the impact of these strategies on both task accuracy and computational efficiency. Our findings reveal that centralized governance, instructor-led participation, ordered interaction patterns, and instructor-curated context summarization collectively optimize the trade-off between decision quality and resource utilization with the support of the proposed Token-Accuracy Ratio (TAR). This work establishes a foundation for designing adaptive, scalable multi-agent systems, shifting the focus from structural novelty to strategic interaction mechanics.
comment: ACL 2025
Steady-State Strategy Synthesis for Swarms of Autonomous Agents
Steady-state synthesis aims to construct a policy for a given MDP $D$ such that the long-run average frequencies of visits to the vertices of $D$ satisfy given numerical constraints. This problem is solvable in polynomial time, and memoryless policies are sufficient for approximating an arbitrary frequency vector achievable by a general (infinite-memory) policy. We study the steady-state synthesis problem for multiagent systems, where multiple autonomous agents jointly strive to achieve a suitable frequency vector. We show that the problem for multiple agents is computationally hard (PSPACE or NP hard, depending on the variant), and memoryless strategy profiles are insufficient for approximating achievable frequency vectors. Furthermore, we prove that even evaluating the frequency vector achieved by a given memoryless profile is computationally hard. This reveals a severe barrier to constructing an efficient synthesis algorithm, even for memoryless profiles. Nevertheless, we design an efficient and scalable synthesis algorithm for a subclass of full memoryless profiles, and we evaluate this algorithm on a large class of randomly generated instances. The experimental results demonstrate a significant improvement against a naive algorithm based on strategy sharing.
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
The rapid advancement of Large Language Models (LLMs) has stimulated interest in multi-agent collaboration for addressing complex medical tasks. However, the practical advantages of multi-agent collaboration approaches remain insufficiently understood. Existing evaluations often lack generalizability, failing to cover diverse tasks reflective of real-world clinical practice, and frequently omit rigorous comparisons against both single-LLM-based and established conventional methods. To address this critical gap, we introduce MedAgentBoard, a comprehensive benchmark for the systematic evaluation of multi-agent collaboration, single-LLM, and conventional approaches. MedAgentBoard encompasses four diverse medical task categories: (1) medical (visual) question answering, (2) lay summary generation, (3) structured Electronic Health Record (EHR) predictive modeling, and (4) clinical workflow automation, across text, medical images, and structured EHR data. Our extensive experiments reveal a nuanced landscape: while multi-agent collaboration demonstrates benefits in specific scenarios, such as enhancing task completeness in clinical workflow automation, it does not consistently outperform advanced single LLMs (e.g., in textual medical QA) or, critically, specialized conventional methods that generally maintain better performance in tasks like medical VQA and EHR-based prediction. MedAgentBoard offers a vital resource and actionable insights, emphasizing the necessity of a task-specific, evidence-based approach to selecting and developing AI solutions in medicine. It underscores that the inherent complexity and overhead of multi-agent collaboration must be carefully weighed against tangible performance gains. All code, datasets, detailed prompts, and experimental results are open-sourced at https://medagentboard.netlify.app/.
ACPs: Agent Collaboration Protocols for the Internet of Agents
With the rapid advancement of artificial intelligence, the proliferation of autonomous agents has introduced new challenges in interoperability, scalability, and coordination. The Internet of Agents (IoA) aims to interconnect heterogeneous agents through standardized communication protocols, enabling seamless collaboration and intelligent task execution. However, existing agent communication protocols such as MCP, A2A, and ANP remain fragmented and scenario-specific. To address this gap, we propose Agent Collaboration Protocols (ACPs), a comprehensive protocol suite for the IoA. ACPs include registration, discovery, interaction, and tooling protocols to support trustable access, capability orchestration, and workflow construction. We present the architecture, key technologies, and application workflows of ACPs, and demonstrate its effectiveness in a collaborative restaurant booking scenario. ACPs lay the foundation for building a secure, open, and scalable agent internet infrastructure.
comment: 7 pages, 8 figures
Carbon Footprint Reduction for Sustainable Data Centers in Real-Time
As machine learning workloads significantly increase energy consumption, sustainable data centers with low carbon emissions are becoming a top priority for governments and corporations worldwide. This requires a paradigm shift in optimizing power consumption in cooling and IT loads, shifting flexible loads based on the availability of renewable energy in the power grid, and leveraging battery storage from the uninterrupted power supply in data centers, using collaborative agents. The complex association between these optimization strategies and their dependencies on variable external factors like weather and the power grid carbon intensity makes this a hard problem. Currently, a real-time controller to optimize all these goals simultaneously in a dynamic real-world setting is lacking. We propose a Data Center Carbon Footprint Reduction (DC-CFR) multi-agent Reinforcement Learning (MARL) framework that optimizes data centers for the multiple objectives of carbon footprint reduction, energy consumption, and energy cost. The results show that the DC-CFR MARL agents effectively resolved the complex interdependencies in optimizing cooling, load shifting, and energy storage in real-time for various locations under real-world dynamic weather and grid carbon intensity conditions. DC-CFR significantly outperformed the industry standard ASHRAE controller with a considerable reduction in carbon emissions (14.5%), energy usage (14.4%), and energy cost (13.7%) when evaluated over one year across multiple geographical regions.
Deploying Ten Thousand Robots: Scalable Imitation Learning for Lifelong Multi-Agent Path Finding ICRA 2025
Lifelong Multi-Agent Path Finding (LMAPF) repeatedly finds collision-free paths for multiple agents that are continually assigned new goals when they reach current ones. Recently, this field has embraced learning-based methods, which reactively generate single-step actions based on individual local observations. However, it is still challenging for them to match the performance of the best search-based algorithms, especially in large-scale settings. This work proposes an imitation-learning-based LMAPF solver that introduces a novel communication module as well as systematic single-step collision resolution and global guidance techniques. Our proposed solver, Scalable Imitation Learning for LMAPF (SILLM), inherits the fast reasoning speed of learning-based methods and the high solution quality of search-based methods with the help of modern GPUs. Across six large-scale maps with up to 10,000 agents and varying obstacle structures, SILLM surpasses the best learning- and search-based baselines, achieving average throughput improvements of 137.7% and 16.0%, respectively. Furthermore, SILLM also beats the winning solution of the 2023 League of Robot Runners, an international LMAPF competition. Finally, we validated SILLM with 10 real robots and 100 virtual robots in a mock warehouse environment.
comment: Accepted by ICRA 2025
Systems and Control (CS)
Robust Reinforcement Learning-Based Locomotion for Resource-Constrained Quadrupeds with Exteroceptive Sensing ICRA
Compact quadrupedal robots are proving increasingly suitable for deployment in real-world scenarios. Their smaller size fosters easy integration into human environments. Nevertheless, real-time locomotion on uneven terrains remains challenging, particularly due to the high computational demands of terrain perception. This paper presents a robust reinforcement learning-based exteroceptive locomotion controller for resource-constrained small-scale quadrupeds in challenging terrains, which exploits real-time elevation mapping, supported by a careful depth sensor selection. We concurrently train both a policy and a state estimator, which together provide an odometry source for elevation mapping, optionally fused with visual-inertial odometry (VIO). We demonstrate the importance of positioning an additional time-of-flight sensor for maintaining robustness even without VIO, thus having the potential to free up computational resources. We experimentally demonstrate that the proposed controller can flawlessly traverse steps up to 17.5 cm in height and achieve an 80% success rate on 22.5 cm steps, both with and without VIO. The proposed controller also achieves accurate forward and yaw velocity tracking of up to 1.0 m/s and 1.5 rad/s respectively. We open-source our training code at github.com/ETH-PBL/elmap-rl-controller.
comment: This paper has been accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA), Atlanta 2025. The code is available at github.com/ETH-PBL/elmap-rl-controller
Optimal Task and Motion Planning for Autonomous Systems Using Petri Nets
This study deals with the problem of task and motion planning of autonomous systems within the context of high-level tasks. Specifically, a task comprises logical requirements (conjunctions, disjunctions, and negations) on the trajectories and final states of agents in certain regions of interest. We propose an optimal planning approach that combines offline computation and online planning. First, a simplified Petri net system is proposed to model the autonomous system. Then, indicating places are designed to implement the logical requirements of the specifications. Building upon this, a compact representation of the state space called extended basis reachability graph is constructed and an efficient online planning algorithm is developed to obtain the optimal plan. It is shown that the most burdensome part of the planning procedure may be removed offline, thanks to the construction of the extended basis reachability graph. Finally, series of simulations are conducted to demonstrate the computational efficiency and scalability of our developed method.
Unleashing Automated Congestion Control Customization in the Wild
Congestion control (CC) crucially impacts user experience across Internet services like streaming, gaming, AR/VR, and connected cars. Traditionally, CC algorithm design seeks universal control rules that yield high performance across diverse application domains and networks. However, varying service needs and network conditions challenge this approach. We share operational experience with a system that automatically customizes congestion control logic to service needs and network conditions. We discuss design, deployment challenges, and solutions, highlighting performance benefits through case studies in streaming, gaming, connected cars, and more. Our system leverages PCC Vivace, an online-learning based congestion control protocol developed by researchers. Hence, along with insights from customizing congestion control, we also discuss lessons learned and modifications made to adapt PCC Vivace for real-world deployment.
Time-Continuous Frequency Allocation for Feeder Links of Mega Constellations with Multi-Antenna Gateway Stations
With the recent rapid advancement of mega low earth orbit (LEO) satellite constellations, multi-antenna gateway station (MAGS) has emerged as a key enabler to support extremely high system capacity via massive feeder links. However, the densification of both space and ground segment leads to reduced spatial separation between links, posing unprecedented challenges of interference exacerbation. This paper investigates graph coloring-based frequency allocation methods for interference mitigation (IM) of mega LEO systems. We first reveal the characteristics of MAGS interference pattern and formulate the IM problem into a $K$-coloring problem using an adaptive threshold method. Then we propose two tailored graph coloring algorithms, namely Generalized Global (GG) and Clique-Based Tabu Search (CTS), to solve this problem. GG employs a low-complexity greedy conflict avoidance strategy, while CTS leverages the unique clique structure brought by MAGSs to enhance IM performance. Subsequently, we innovatively modify them to achieve time-continuous frequency allocation, which is crucial to ensure the stability of feeder links. Moreover, we further devise two mega constellation decomposition methods to alleviate the complexity burden of satellite operators. Finally, we propose a list coloring-based vacant subchannel utilization method to further improve spectrum efficiency and system capacity. Simulation results on Starlink constellation of the first and second generations with 34396 satellites demonstrate the effectiveness and superiority of the proposed methodology.
Data-Efficient Automatic Shaping of Liquid Droplets on an Air-Ferrofluid Interface with Bayesian Optimization
Manipulating the shape of a liquid droplet is essential for a wide range of applications in medicine and industry. However, existing methods are typically limited to generating simple shapes, such as ellipses, or rely on predefined templates. Although recent approaches have demonstrated more complex geometries, they remain constrained by limited adaptability and lack of real-time control. Here, we introduce a data-efficient method that enables real-time, programmable shaping of nonmagnetic liquid droplets into diverse target forms at the air-ferrofluid interface using Bayesian optimization. The droplet can adopt either convex or concave shapes depending on the actuation of the surrounding electromagnets. Bayesian optimization determines the optimal magnetic flux density for shaping the liquid droplet into a desired target shape. Our method enables automatic shaping into various triangular and rectangular shapes with a maximum shape error of 0.81 mm, as well as into letter-like patterns. To the best of our knowledge, this is the first demonstration of real-time, automatic shaping of nonmagnetic liquid droplets into desired target shapes using magnetic fields or other external energy fields.
comment: 7 pages, 5 figures
Impact of Power Fluctuations on Frequency Quality
This paper analyzes how power injections affect frequency quality in power systems. We first derive a general expression linking active and reactive power injections at buses to the system's frequency. This formulation explicitly considers both real and imaginary frequency components, providing a complete description of frequency behavior in power systems during transients. Next, we extend our analysis to incorporate stochastic variations of power injections. Using the frequency divider concept and power-based frequency estimation, we develop analytical relationships linking stochastic load fluctuations to frequency deviations. We discuss under which conditions the Central Limit Theorem cannot be applied to capture the frequency distribution, thereby clarifying how its hypotheses are not satisfied in power system applications. Then, we establish clear criteria for the appropriate use of statistical methods in frequency analysis. Finally, we validate our theoretical results through simulations on modified IEEE 14-bus and all-island Irish transmission test systems, highlighting the accuracy, practical utility, and limitations of our proposed formulation.
Adaptive MPC-based quadrupedal robot control under periodic disturbances
Recent advancements in adaptive control for reference trajectory tracking enable quadrupedal robots to perform locomotion tasks under challenging conditions. There are methods enabling the estimation of the external disturbances in terms of forces and torques. However, a specific case of disturbances that are periodic was not explicitly tackled in application to quadrupeds. This work is devoted to the estimation of the periodic disturbances with a lightweight regressor using simplified robot dynamics and extracting the disturbance properties in terms of the magnitude and frequency. Experimental evidence suggests performance improvement over the baseline static disturbance compensation. All source files, including simulation setups, code, and calculation scripts, are available on GitHub at https://github.com/aidagroup/quad-periodic-mpc.
A universal policy wrapper with guarantees
We introduce a universal policy wrapper for reinforcement learning agents that ensures formal goal-reaching guarantees. In contrast to standard reinforcement learning algorithms that excel in performance but lack rigorous safety assurances, our wrapper selectively switches between a high-performing base policy -- derived from any existing RL method -- and a fallback policy with known convergence properties. Base policy's value function supervises this switching process, determining when the fallback policy should override the base policy to ensure the system remains on a stable path. The analysis proves that our wrapper inherits the fallback policy's goal-reaching guarantees while preserving or improving upon the performance of the base policy. Notably, it operates without needing additional system knowledge or online constrained optimization, making it readily deployable across diverse reinforcement learning architectures and tasks.
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a specified goal set with known probability and provide precise bounds on maximum deviation and convergence time. Empirical validation on control tasks demonstrates enhanced performance while maintaining stability guarantees.
Finite-time stabilization of ladder multi-level quantum systems
In this paper, a novel continuous non-smooth control strategy is proposed to achieve finite-time stabilization of ladder quantum systems. We first design a universal fractional-order control law for a ladder n-level quantum system using a distance-based Lyapunov function, and then apply the Filippov solution in the sense of differential inclusions and the LaSalle's invariance principle to prove the existence and uniqueness of the solution of the ladder system under the continuous non-smooth control law. Both asymptotic stability and finite-time stability for the ladder system is rigorously established by applying Lyapunov stability theory and finite-time stability criteria. We also derive an upper bound of the time required for convergence to an eigenstate of the intrinsic Hamiltonian. Numerical simulations on a rubidium ladder three-level atomic system validate the effectiveness of the proposed method.
comment: 12 pages, 4 figures
Carbon Footprint Reduction for Sustainable Data Centers in Real-Time
As machine learning workloads significantly increase energy consumption, sustainable data centers with low carbon emissions are becoming a top priority for governments and corporations worldwide. This requires a paradigm shift in optimizing power consumption in cooling and IT loads, shifting flexible loads based on the availability of renewable energy in the power grid, and leveraging battery storage from the uninterrupted power supply in data centers, using collaborative agents. The complex association between these optimization strategies and their dependencies on variable external factors like weather and the power grid carbon intensity makes this a hard problem. Currently, a real-time controller to optimize all these goals simultaneously in a dynamic real-world setting is lacking. We propose a Data Center Carbon Footprint Reduction (DC-CFR) multi-agent Reinforcement Learning (MARL) framework that optimizes data centers for the multiple objectives of carbon footprint reduction, energy consumption, and energy cost. The results show that the DC-CFR MARL agents effectively resolved the complex interdependencies in optimizing cooling, load shifting, and energy storage in real-time for various locations under real-world dynamic weather and grid carbon intensity conditions. DC-CFR significantly outperformed the industry standard ASHRAE controller with a considerable reduction in carbon emissions (14.5%), energy usage (14.4%), and energy cost (13.7%) when evaluated over one year across multiple geographical regions.
A Cyclic Small Phase Theorem
This paper introduces a brand-new phase definition called the segmental phase for multi-input multi-output linear time-invariant systems. The underpinning of the definition lies in the matrix segmental phase which, as its name implies, is graphically based on the smallest circular segment covering the matrix normalized numerical range in the unit disk. The matrix segmental phase has the crucial product eigen-phase bound, which makes itself stand out from several existing phase notions in the literature. The proposed bound paves the way for stability analysis of a single-loop cyclic feedback system consisting of multiple subsystems. A cyclic small phase theorem is then established as our main result, which requires the loop system phase to lie between $-\pi$ and $\pi$. The proposed theorem complements a cyclic version of the celebrated small gain theorem. In addition, a generalization of the proposed theorem is made via the use of angular scaling techniques for reducing conservatism.
Value-Oriented Forecast Combinations for Unit Commitment
Value-oriented forecasts for two-stage power system operational problems have been demonstrated to reduce cost, but prove to be computationally challenging for large-scale systems because the underlying optimization problem must be internalized into the forecast model training. Therefore, existing approaches typically scale poorly in the usable training data or require relaxations of the underlying optimization. This paper presents a method for value-oriented forecast combinations using progressive hedging, which unlocks high-fidelity, at-scale models and large-scale datasets in training. We also derive one-shot training model for reference and study how different modifications of the training model impact the solution quality.
Timely Trajectory Reconstruction in Finite Buffer Remote Tracking Systems
Remote tracking systems play a critical role in applications such as IoT, monitoring, surveillance and healthcare. In such systems, maintaining both real-time state awareness (for online decision making) and accurate reconstruction of historical trajectories (for offline post-processing) are essential. While the Age of Information (AoI) metric has been extensively studied as a measure of freshness, it does not capture the accuracy with which past trajectories can be reconstructed. In this work, we investigate reconstruction error as a complementary metric to AoI, addressing the trade-off between timely updates and historical accuracy. Specifically, we consider three policies, each prioritizing different aspects of information management: Keep-Old, Keep-Fresh, and our proposed Inter-arrival-Aware dropping policy. We compare these policies in terms of impact on both AoI and reconstruction error in a remote tracking system with a finite buffer. Through theoretical analysis and numerical simulations of queueing behavior, we demonstrate that while the Keep-Fresh policy minimizes AoI, it does not necessarily minimize reconstruction error. In contrast, our proposed Inter-arrival-Aware dropping policy dynamically adjusts packet retention decisions based on generation times, achieving a balance between AoI and reconstruction error. Our results provide key insights into the design of efficient buffer management policies for resource-constrained IoT networks.
Safety Critical Control for Nonlinear Systems with Complex Input Constraints
In this paper, we propose a novel Control Barrier Function (CBF) based controller for nonlinear systems with complex, time-varying input constraints. To deal with these constraints, we introduce an auxiliary control input to transform the original system into an augmented one, thus reformulating the constrained-input problem into a constrained-output one. This transformation simplifies the Quadratic Programming (QP) formulation and enhances compatibility with the CBF framework. As a result, the proposed method can systematically address the complex, time-varying, and state-dependent input constraints. The efficacy of the proposed approach is validated using numerical examples.
comment: 8 pages, 2 figures
Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control
We consider a joint uplink and downlink scheduling problem of a fully distributed wireless networked control system (WNCS) with a limited number of frequency channels. Using elements of stochastic systems theory, we derive a sufficient stability condition of the WNCS, which is stated in terms of both the control and communication system parameters. Once the condition is satisfied, there exists a stationary and deterministic scheduling policy that can stabilize all plants of the WNCS. By analyzing and representing the per-step cost function of the WNCS in terms of a finite-length countable vector state, we formulate the optimal transmission scheduling problem into a Markov decision process and develop a deep reinforcement learning (DRL) based framework for solving it. To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). Numerical results show that the proposed algorithm significantly outperforms benchmark policies.
comment: Accepted by IEEE Transactions on Cybernetics
Nonlinear integral extension of PID control with improved convergence of perturbed second-order dynamic systems
Nonlinear extension of the integral part of a standard proportional-integral-derivative (PID) feedback control is proposed for perturbed second-order systems. The approach is model-free and requires solely the Lipschitz boundedness of the unknown matched perturbations. For constant disturbances, the global asymptotic stability is shown based on the circle criterion. For Lipschitz perturbations, an ultimately bounded output error is provided based on the steady-state behavior in frequency domain. Also the transient response to the stepwise disturbances is analyzed for the control tuning. Based on the developed analysis, the design recommendations are formulated as a step by step procedure. It is also discussed how the proposed control is applicable to second-order systems extended by additional (parasitic) actuator dynamics with low-pass characteristics. The proposed nonlinear control is proven to outperform its linear PID counterpart during the settling phase, i.e. at convergence of the residual output error. An experimental case study of the second-order system with an additional actuator dynamics and considerable perturbations is demonstrated to confirm and benchmark the control performance.
comment: 13 pages, 9 figures
Electrifying Heavy-Duty Trucks: Battery-Swapping vs Fast Charging
The advantages and disadvantages of Battery Swapping Stations (BSS) for heavy-duty trucks are poorly understood, relative to Fast Charging Stations (FCS) systems. This study evaluates these two charging mechanisms for electric heavy-duty trucks, aiming to compare the systems' efficiency and identify their optimal design. A model was developed to address the planning and operation of BSS in a charging network, considering in-station batteries as assets for various services. We assess performance metrics including transportation efficiency and battery utilization efficiency. Our evaluation reveals that BSS significantly increased transportation efficiency by reducing vehicle downtime compared to fast charging, but may require more batteries. BSS with medium-sized batteries offers improved transportation efficiency in terms of time and labor. FCS-reliant trucks require larger batteries to compensate for extended charging times. To understand the trade-off between these two metrics, a cost-benefit analysis was performed under different scenarios involving potential shifts in battery prices and labor costs. Additionally, BSS shows potential for significant $\text{CO}_2$ emission reductions and increased profitability through energy arbitrage and grid ancillary services. These findings emphasize the importance of integrating BSS into future electric truck charging networks and adopting carbon-aware operational frameworks.
Systems and Control (EESS)
Robust Reinforcement Learning-Based Locomotion for Resource-Constrained Quadrupeds with Exteroceptive Sensing ICRA
Compact quadrupedal robots are proving increasingly suitable for deployment in real-world scenarios. Their smaller size fosters easy integration into human environments. Nevertheless, real-time locomotion on uneven terrains remains challenging, particularly due to the high computational demands of terrain perception. This paper presents a robust reinforcement learning-based exteroceptive locomotion controller for resource-constrained small-scale quadrupeds in challenging terrains, which exploits real-time elevation mapping, supported by a careful depth sensor selection. We concurrently train both a policy and a state estimator, which together provide an odometry source for elevation mapping, optionally fused with visual-inertial odometry (VIO). We demonstrate the importance of positioning an additional time-of-flight sensor for maintaining robustness even without VIO, thus having the potential to free up computational resources. We experimentally demonstrate that the proposed controller can flawlessly traverse steps up to 17.5 cm in height and achieve an 80% success rate on 22.5 cm steps, both with and without VIO. The proposed controller also achieves accurate forward and yaw velocity tracking of up to 1.0 m/s and 1.5 rad/s respectively. We open-source our training code at github.com/ETH-PBL/elmap-rl-controller.
comment: This paper has been accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA), Atlanta 2025. The code is available at github.com/ETH-PBL/elmap-rl-controller
Optimal Task and Motion Planning for Autonomous Systems Using Petri Nets
This study deals with the problem of task and motion planning of autonomous systems within the context of high-level tasks. Specifically, a task comprises logical requirements (conjunctions, disjunctions, and negations) on the trajectories and final states of agents in certain regions of interest. We propose an optimal planning approach that combines offline computation and online planning. First, a simplified Petri net system is proposed to model the autonomous system. Then, indicating places are designed to implement the logical requirements of the specifications. Building upon this, a compact representation of the state space called extended basis reachability graph is constructed and an efficient online planning algorithm is developed to obtain the optimal plan. It is shown that the most burdensome part of the planning procedure may be removed offline, thanks to the construction of the extended basis reachability graph. Finally, series of simulations are conducted to demonstrate the computational efficiency and scalability of our developed method.
Unleashing Automated Congestion Control Customization in the Wild
Congestion control (CC) crucially impacts user experience across Internet services like streaming, gaming, AR/VR, and connected cars. Traditionally, CC algorithm design seeks universal control rules that yield high performance across diverse application domains and networks. However, varying service needs and network conditions challenge this approach. We share operational experience with a system that automatically customizes congestion control logic to service needs and network conditions. We discuss design, deployment challenges, and solutions, highlighting performance benefits through case studies in streaming, gaming, connected cars, and more. Our system leverages PCC Vivace, an online-learning based congestion control protocol developed by researchers. Hence, along with insights from customizing congestion control, we also discuss lessons learned and modifications made to adapt PCC Vivace for real-world deployment.
Time-Continuous Frequency Allocation for Feeder Links of Mega Constellations with Multi-Antenna Gateway Stations
With the recent rapid advancement of mega low earth orbit (LEO) satellite constellations, multi-antenna gateway station (MAGS) has emerged as a key enabler to support extremely high system capacity via massive feeder links. However, the densification of both space and ground segment leads to reduced spatial separation between links, posing unprecedented challenges of interference exacerbation. This paper investigates graph coloring-based frequency allocation methods for interference mitigation (IM) of mega LEO systems. We first reveal the characteristics of MAGS interference pattern and formulate the IM problem into a $K$-coloring problem using an adaptive threshold method. Then we propose two tailored graph coloring algorithms, namely Generalized Global (GG) and Clique-Based Tabu Search (CTS), to solve this problem. GG employs a low-complexity greedy conflict avoidance strategy, while CTS leverages the unique clique structure brought by MAGSs to enhance IM performance. Subsequently, we innovatively modify them to achieve time-continuous frequency allocation, which is crucial to ensure the stability of feeder links. Moreover, we further devise two mega constellation decomposition methods to alleviate the complexity burden of satellite operators. Finally, we propose a list coloring-based vacant subchannel utilization method to further improve spectrum efficiency and system capacity. Simulation results on Starlink constellation of the first and second generations with 34396 satellites demonstrate the effectiveness and superiority of the proposed methodology.
Data-Efficient Automatic Shaping of Liquid Droplets on an Air-Ferrofluid Interface with Bayesian Optimization
Manipulating the shape of a liquid droplet is essential for a wide range of applications in medicine and industry. However, existing methods are typically limited to generating simple shapes, such as ellipses, or rely on predefined templates. Although recent approaches have demonstrated more complex geometries, they remain constrained by limited adaptability and lack of real-time control. Here, we introduce a data-efficient method that enables real-time, programmable shaping of nonmagnetic liquid droplets into diverse target forms at the air-ferrofluid interface using Bayesian optimization. The droplet can adopt either convex or concave shapes depending on the actuation of the surrounding electromagnets. Bayesian optimization determines the optimal magnetic flux density for shaping the liquid droplet into a desired target shape. Our method enables automatic shaping into various triangular and rectangular shapes with a maximum shape error of 0.81 mm, as well as into letter-like patterns. To the best of our knowledge, this is the first demonstration of real-time, automatic shaping of nonmagnetic liquid droplets into desired target shapes using magnetic fields or other external energy fields.
comment: 7 pages, 5 figures
Impact of Power Fluctuations on Frequency Quality
This paper analyzes how power injections affect frequency quality in power systems. We first derive a general expression linking active and reactive power injections at buses to the system's frequency. This formulation explicitly considers both real and imaginary frequency components, providing a complete description of frequency behavior in power systems during transients. Next, we extend our analysis to incorporate stochastic variations of power injections. Using the frequency divider concept and power-based frequency estimation, we develop analytical relationships linking stochastic load fluctuations to frequency deviations. We discuss under which conditions the Central Limit Theorem cannot be applied to capture the frequency distribution, thereby clarifying how its hypotheses are not satisfied in power system applications. Then, we establish clear criteria for the appropriate use of statistical methods in frequency analysis. Finally, we validate our theoretical results through simulations on modified IEEE 14-bus and all-island Irish transmission test systems, highlighting the accuracy, practical utility, and limitations of our proposed formulation.
Adaptive MPC-based quadrupedal robot control under periodic disturbances
Recent advancements in adaptive control for reference trajectory tracking enable quadrupedal robots to perform locomotion tasks under challenging conditions. There are methods enabling the estimation of the external disturbances in terms of forces and torques. However, a specific case of disturbances that are periodic was not explicitly tackled in application to quadrupeds. This work is devoted to the estimation of the periodic disturbances with a lightweight regressor using simplified robot dynamics and extracting the disturbance properties in terms of the magnitude and frequency. Experimental evidence suggests performance improvement over the baseline static disturbance compensation. All source files, including simulation setups, code, and calculation scripts, are available on GitHub at https://github.com/aidagroup/quad-periodic-mpc.
A universal policy wrapper with guarantees
We introduce a universal policy wrapper for reinforcement learning agents that ensures formal goal-reaching guarantees. In contrast to standard reinforcement learning algorithms that excel in performance but lack rigorous safety assurances, our wrapper selectively switches between a high-performing base policy -- derived from any existing RL method -- and a fallback policy with known convergence properties. Base policy's value function supervises this switching process, determining when the fallback policy should override the base policy to ensure the system remains on a stable path. The analysis proves that our wrapper inherits the fallback policy's goal-reaching guarantees while preserving or improving upon the performance of the base policy. Notably, it operates without needing additional system knowledge or online constrained optimization, making it readily deployable across diverse reinforcement learning architectures and tasks.
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a specified goal set with known probability and provide precise bounds on maximum deviation and convergence time. Empirical validation on control tasks demonstrates enhanced performance while maintaining stability guarantees.
Finite-time stabilization of ladder multi-level quantum systems
In this paper, a novel continuous non-smooth control strategy is proposed to achieve finite-time stabilization of ladder quantum systems. We first design a universal fractional-order control law for a ladder n-level quantum system using a distance-based Lyapunov function, and then apply the Filippov solution in the sense of differential inclusions and the LaSalle's invariance principle to prove the existence and uniqueness of the solution of the ladder system under the continuous non-smooth control law. Both asymptotic stability and finite-time stability for the ladder system is rigorously established by applying Lyapunov stability theory and finite-time stability criteria. We also derive an upper bound of the time required for convergence to an eigenstate of the intrinsic Hamiltonian. Numerical simulations on a rubidium ladder three-level atomic system validate the effectiveness of the proposed method.
comment: 12 pages, 4 figures
Carbon Footprint Reduction for Sustainable Data Centers in Real-Time
As machine learning workloads significantly increase energy consumption, sustainable data centers with low carbon emissions are becoming a top priority for governments and corporations worldwide. This requires a paradigm shift in optimizing power consumption in cooling and IT loads, shifting flexible loads based on the availability of renewable energy in the power grid, and leveraging battery storage from the uninterrupted power supply in data centers, using collaborative agents. The complex association between these optimization strategies and their dependencies on variable external factors like weather and the power grid carbon intensity makes this a hard problem. Currently, a real-time controller to optimize all these goals simultaneously in a dynamic real-world setting is lacking. We propose a Data Center Carbon Footprint Reduction (DC-CFR) multi-agent Reinforcement Learning (MARL) framework that optimizes data centers for the multiple objectives of carbon footprint reduction, energy consumption, and energy cost. The results show that the DC-CFR MARL agents effectively resolved the complex interdependencies in optimizing cooling, load shifting, and energy storage in real-time for various locations under real-world dynamic weather and grid carbon intensity conditions. DC-CFR significantly outperformed the industry standard ASHRAE controller with a considerable reduction in carbon emissions (14.5%), energy usage (14.4%), and energy cost (13.7%) when evaluated over one year across multiple geographical regions.
A Cyclic Small Phase Theorem
This paper introduces a brand-new phase definition called the segmental phase for multi-input multi-output linear time-invariant systems. The underpinning of the definition lies in the matrix segmental phase which, as its name implies, is graphically based on the smallest circular segment covering the matrix normalized numerical range in the unit disk. The matrix segmental phase has the crucial product eigen-phase bound, which makes itself stand out from several existing phase notions in the literature. The proposed bound paves the way for stability analysis of a single-loop cyclic feedback system consisting of multiple subsystems. A cyclic small phase theorem is then established as our main result, which requires the loop system phase to lie between $-\pi$ and $\pi$. The proposed theorem complements a cyclic version of the celebrated small gain theorem. In addition, a generalization of the proposed theorem is made via the use of angular scaling techniques for reducing conservatism.
Value-Oriented Forecast Combinations for Unit Commitment
Value-oriented forecasts for two-stage power system operational problems have been demonstrated to reduce cost, but prove to be computationally challenging for large-scale systems because the underlying optimization problem must be internalized into the forecast model training. Therefore, existing approaches typically scale poorly in the usable training data or require relaxations of the underlying optimization. This paper presents a method for value-oriented forecast combinations using progressive hedging, which unlocks high-fidelity, at-scale models and large-scale datasets in training. We also derive one-shot training model for reference and study how different modifications of the training model impact the solution quality.
Timely Trajectory Reconstruction in Finite Buffer Remote Tracking Systems
Remote tracking systems play a critical role in applications such as IoT, monitoring, surveillance and healthcare. In such systems, maintaining both real-time state awareness (for online decision making) and accurate reconstruction of historical trajectories (for offline post-processing) are essential. While the Age of Information (AoI) metric has been extensively studied as a measure of freshness, it does not capture the accuracy with which past trajectories can be reconstructed. In this work, we investigate reconstruction error as a complementary metric to AoI, addressing the trade-off between timely updates and historical accuracy. Specifically, we consider three policies, each prioritizing different aspects of information management: Keep-Old, Keep-Fresh, and our proposed Inter-arrival-Aware dropping policy. We compare these policies in terms of impact on both AoI and reconstruction error in a remote tracking system with a finite buffer. Through theoretical analysis and numerical simulations of queueing behavior, we demonstrate that while the Keep-Fresh policy minimizes AoI, it does not necessarily minimize reconstruction error. In contrast, our proposed Inter-arrival-Aware dropping policy dynamically adjusts packet retention decisions based on generation times, achieving a balance between AoI and reconstruction error. Our results provide key insights into the design of efficient buffer management policies for resource-constrained IoT networks.
Safety Critical Control for Nonlinear Systems with Complex Input Constraints
In this paper, we propose a novel Control Barrier Function (CBF) based controller for nonlinear systems with complex, time-varying input constraints. To deal with these constraints, we introduce an auxiliary control input to transform the original system into an augmented one, thus reformulating the constrained-input problem into a constrained-output one. This transformation simplifies the Quadratic Programming (QP) formulation and enhances compatibility with the CBF framework. As a result, the proposed method can systematically address the complex, time-varying, and state-dependent input constraints. The efficacy of the proposed approach is validated using numerical examples.
comment: 8 pages, 2 figures
Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control
We consider a joint uplink and downlink scheduling problem of a fully distributed wireless networked control system (WNCS) with a limited number of frequency channels. Using elements of stochastic systems theory, we derive a sufficient stability condition of the WNCS, which is stated in terms of both the control and communication system parameters. Once the condition is satisfied, there exists a stationary and deterministic scheduling policy that can stabilize all plants of the WNCS. By analyzing and representing the per-step cost function of the WNCS in terms of a finite-length countable vector state, we formulate the optimal transmission scheduling problem into a Markov decision process and develop a deep reinforcement learning (DRL) based framework for solving it. To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). Numerical results show that the proposed algorithm significantly outperforms benchmark policies.
comment: Accepted by IEEE Transactions on Cybernetics
Nonlinear integral extension of PID control with improved convergence of perturbed second-order dynamic systems
Nonlinear extension of the integral part of a standard proportional-integral-derivative (PID) feedback control is proposed for perturbed second-order systems. The approach is model-free and requires solely the Lipschitz boundedness of the unknown matched perturbations. For constant disturbances, the global asymptotic stability is shown based on the circle criterion. For Lipschitz perturbations, an ultimately bounded output error is provided based on the steady-state behavior in frequency domain. Also the transient response to the stepwise disturbances is analyzed for the control tuning. Based on the developed analysis, the design recommendations are formulated as a step by step procedure. It is also discussed how the proposed control is applicable to second-order systems extended by additional (parasitic) actuator dynamics with low-pass characteristics. The proposed nonlinear control is proven to outperform its linear PID counterpart during the settling phase, i.e. at convergence of the residual output error. An experimental case study of the second-order system with an additional actuator dynamics and considerable perturbations is demonstrated to confirm and benchmark the control performance.
comment: 13 pages, 9 figures
Electrifying Heavy-Duty Trucks: Battery-Swapping vs Fast Charging
The advantages and disadvantages of Battery Swapping Stations (BSS) for heavy-duty trucks are poorly understood, relative to Fast Charging Stations (FCS) systems. This study evaluates these two charging mechanisms for electric heavy-duty trucks, aiming to compare the systems' efficiency and identify their optimal design. A model was developed to address the planning and operation of BSS in a charging network, considering in-station batteries as assets for various services. We assess performance metrics including transportation efficiency and battery utilization efficiency. Our evaluation reveals that BSS significantly increased transportation efficiency by reducing vehicle downtime compared to fast charging, but may require more batteries. BSS with medium-sized batteries offers improved transportation efficiency in terms of time and labor. FCS-reliant trucks require larger batteries to compensate for extended charging times. To understand the trade-off between these two metrics, a cost-benefit analysis was performed under different scenarios involving potential shifts in battery prices and labor costs. Additionally, BSS shows potential for significant $\text{CO}_2$ emission reductions and increased profitability through energy arbitrage and grid ancillary services. These findings emphasize the importance of integrating BSS into future electric truck charging networks and adopting carbon-aware operational frameworks.
Robotics
Federated Deep Reinforcement Learning for Privacy-Preserving Robotic-Assisted Surgery
The integration of Reinforcement Learning (RL) into robotic-assisted surgery (RAS) holds significant promise for advancing surgical precision, adaptability, and autonomous decision-making. However, the development of robust RL models in clinical settings is hindered by key challenges, including stringent patient data privacy regulations, limited access to diverse surgical datasets, and high procedural variability. To address these limitations, this paper presents a Federated Deep Reinforcement Learning (FDRL) framework that enables decentralized training of RL models across multiple healthcare institutions without exposing sensitive patient information. A central innovation of the proposed framework is its dynamic policy adaptation mechanism, which allows surgical robots to select and tailor patient-specific policies in real-time, thereby ensuring personalized and Optimised interventions. To uphold rigorous privacy standards while facilitating collaborative learning, the FDRL framework incorporates secure aggregation, differential privacy, and homomorphic encryption techniques. Experimental results demonstrate a 60\% reduction in privacy leakage compared to conventional methods, with surgical precision maintained within a 1.5\% margin of a centralized baseline. This work establishes a foundational approach for adaptive, secure, and patient-centric AI-driven surgical robotics, offering a pathway toward clinical translation and scalable deployment across diverse healthcare environments.
comment: 11 pages, 7 figures, conference
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Mobile robots are increasingly deployed in unstructured environments where obstacles and objects are movable. Navigation in such environments is known as interactive navigation, where task completion requires not only avoiding obstacles but also strategic interactions with movable objects. Non-prehensile interactive navigation focuses on non-grasping interaction strategies, such as pushing, rather than relying on prehensile manipulation. Despite a growing body of research in this field, most solutions are evaluated using case-specific setups, limiting reproducibility and cross-comparison. In this paper, we present Bench-NPIN, the first comprehensive benchmark for non-prehensile interactive navigation. Bench-NPIN includes multiple components: 1) a comprehensive range of simulated environments for non-prehensile interactive navigation tasks, including navigating a maze with movable obstacles, autonomous ship navigation in icy waters, box delivery, and area clearing, each with varying levels of complexity; 2) a set of evaluation metrics that capture unique aspects of interactive navigation, such as efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-NPIN to evaluate example implementations of established baselines across environments. Bench-NPIN is an open-source Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN.
comment: 8 pages, 10 figures
TrainBo: An Interactive Robot-assisted Scenario Training System for Older Adults with Dementia
Dementia is an overall decline in memory and cognitive skills severe enough to reduce an elders ability to perform everyday activities. There is an increasing need for accessible technologies for cognitive training to slow down the cognitive decline. With the ability to provide instant feedback and assistance, social robotic systems have been proven effective in enhancing learning abilities across various age groups. This study focuses on the design of an interactive robot-assisted scenario training system TrainBo with self-determination theory, derives design requirements through formative and formal studies and the system usability is also be evaluated. A pilot test is conducted on seven older adults with dementia in an elderly care center in Hong Kong for four weeks. Our finding shows that older adults with dementia have an improvement in behavioural engagement, emotional engagement, and intrinsic motivation after using Trainbo. These findings can provide valuable insights into the development of more captivating interactive robots for extensive training purposes.
L2D2: Robot Learning from 2D Drawings
Robots should learn new tasks from humans. But how do humans convey what they want the robot to do? Existing methods largely rely on humans physically guiding the robot arm throughout their intended task. Unfortunately -- as we scale up the amount of data -- physical guidance becomes prohibitively burdensome. Not only do humans need to operate robot hardware but also modify the environment (e.g., moving and resetting objects) to provide multiple task examples. In this work we propose L2D2, a sketching interface and imitation learning algorithm where humans can provide demonstrations by drawing the task. L2D2 starts with a single image of the robot arm and its workspace. Using a tablet, users draw and label trajectories on this image to illustrate how the robot should act. To collect new and diverse demonstrations, we no longer need the human to physically reset the workspace; instead, L2D2 leverages vision-language segmentation to autonomously vary object locations and generate synthetic images for the human to draw upon. We recognize that drawing trajectories is not as information-rich as physically demonstrating the task. Drawings are 2-dimensional and do not capture how the robot's actions affect its environment. To address these fundamental challenges the next stage of L2D2 grounds the human's static, 2D drawings in our dynamic, 3D world by leveraging a small set of physical demonstrations. Our experiments and user study suggest that L2D2 enables humans to provide more demonstrations with less time and effort than traditional approaches, and users prefer drawings over physical manipulation. When compared to other drawing-based approaches, we find that L2D2 learns more performant robot policies, requires a smaller dataset, and can generalize to longer-horizon tasks. See our project website: https://collab.me.vt.edu/L2D2/
Growable and Interpretable Neural Control with Online Continual Learning for Autonomous Lifelong Locomotion Learning Machines
Continual locomotion learning faces four challenges: incomprehensibility, sample inefficiency, lack of knowledge exploitation, and catastrophic forgetting. Thus, this work introduces Growable Online Locomotion Learning Under Multicondition (GOLLUM), which exploits the interpretability feature to address the aforementioned challenges. GOLLUM has two dimensions of interpretability: layer-wise interpretability for neural control function encoding and column-wise interpretability for robot skill encoding. With this interpretable control structure, GOLLUM utilizes neurogenesis to unsupervisely increment columns (ring-like networks); each column is trained separately to encode and maintain a specific primary robot skill. GOLLUM also transfers the parameters to new skills and supplements the learned combination of acquired skills through another neural mapping layer added (layer-wise) with online supplementary learning. On a physical hexapod robot, GOLLUM successfully acquired multiple locomotion skills (e.g., walking, slope climbing, and bouncing) autonomously and continuously within an hour using a simple reward function. Furthermore, it demonstrated the capability of combining previous learned skills to facilitate the learning process of new skills while preventing catastrophic forgetting. Compared to state-of-the-art locomotion learning approaches, GOLLUM is the only approach that addresses the four challenges above mentioned without human intervention. It also emphasizes the potential exploitation of interpretability to achieve autonomous lifelong learning machines.
comment: Accepted Manuscript (IJRR). The International Journal of Robotics Research. 2025
Proactive tactile exploration for object-agnostic shape reconstruction from minimal visual priors
The perception of an object's surface is important for robotic applications enabling robust object manipulation. The level of accuracy in such a representation affects the outcome of the action planning, especially during tasks that require physical contact, e.g. grasping. In this paper, we propose a novel iterative method for 3D shape reconstruction consisting of two steps. At first, a mesh is fitted on data points acquired from the object's surface, based on a single primitive template. Subsequently, the mesh is properly adjusted to adequately represent local deformities. Moreover, a novel proactive tactile exploration strategy aims at minimizing the total uncertainty with the least number of contacts, while reducing the risk of contact failure in case the estimated surface differs significantly from the real one. The performance of the methodology is evaluated both in 3D simulation and on a real setup.
Online Synthesis of Control Barrier Functions with Local Occupancy Grid Maps for Safe Navigation in Unknown Environments
Control Barrier Functions (CBFs) have emerged as an effective and non-invasive safety filter for ensuring the safety of autonomous systems in dynamic environments with formal guarantees. However, most existing works on CBF synthesis focus on fully known settings. Synthesizing CBFs online based on perception data in unknown environments poses particular challenges. Specifically, this requires the construction of CBFs from high-dimensional data efficiently in real time. This paper proposes a new approach for online synthesis of CBFs directly from local Occupancy Grid Maps (OGMs). Inspired by steady-state thermal fields, we show that the smoothness requirement of CBFs corresponds to the solution of the steady-state heat conduction equation with suitably chosen boundary conditions. By leveraging the sparsity of the coefficient matrix in Laplace's equation, our approach allows for efficient computation of safety values for each grid cell in the map. Simulation and real-world experiments demonstrate the effectiveness of our approach. Specifically, the results show that our CBFs can be synthesized in an average of milliseconds on a 200 * 200 grid map, highlighting its real-time applicability.
H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos
Large-scale pre-training using videos has proven effective for robot learning. However, the models pre-trained on such data can be suboptimal for robot learning due to the significant visual gap between human hands and those of different robots. To remedy this, we propose H2R, a simple data augmentation technique that detects human hand keypoints, synthesizes robot motions in simulation, and composites rendered robots into egocentric videos. This process explicitly bridges the visual gap between human and robot embodiments during pre-training. We apply H2R to augment large-scale egocentric human video datasets such as Ego4D and SSv2, replacing human hands with simulated robotic arms to generate robot-centric training data. Based on this, we construct and release a family of 1M-scale datasets covering multiple robot embodiments (UR5 with gripper/Leaphand, Franka) and data sources (SSv2, Ego4D). To verify the effectiveness of the augmentation pipeline, we introduce a CLIP-based image-text similarity metric that quantitatively evaluates the semantic fidelity of robot-rendered frames to the original human actions. We validate H2R across three simulation benchmarks: Robomimic, RLBench and PushT and real-world manipulation tasks with a UR5 robot equipped with Gripper and Leaphand end-effectors. H2R consistently improves downstream success rates, yielding gains of 5.0%-10.2% in simulation and 6.7%-23.3% in real-world tasks across various visual encoders and policy learning methods. These results indicate that H2R improves the generalization ability of robotic policies by mitigating the visual discrepancies between human and robot domains.
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
General-purpose robots capable of performing diverse tasks require synergistic reasoning and acting capabilities. However, recent dual-system approaches, which separate high-level reasoning from low-level acting, often suffer from challenges such as limited mutual understanding of capabilities between systems and latency issues. This paper introduces OneTwoVLA, a single unified vision-language-action model that can perform both acting (System One) and reasoning (System Two). Crucially, OneTwoVLA adaptively switches between two modes: explicitly reasoning at critical moments during task execution, and generating actions based on the most recent reasoning at other times. To further unlock OneTwoVLA's reasoning and generalization capabilities, we design a scalable pipeline for synthesizing embodied reasoning-centric vision-language data, used for co-training with robot data. We validate OneTwoVLA's effectiveness through extensive experiments, highlighting its superior performance across four key capabilities: long-horizon task planning, error detection and recovery, natural human-robot interaction, and generalizable visual grounding, enabling the model to perform long-horizon, highly dexterous manipulation tasks such as making hotpot or mixing cocktails.
GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity
We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video. Existing methods, while achieving impressive results, often struggle with complex objects, particularly those exhibiting symmetry, intricate geometry or complex appearance. To bridge these gaps, we introduce an adaptive method that combines 3D Gaussian Splatting, hybrid geometry/appearance tracking, and key frame selection to achieve robust tracking and accurate reconstructions across a diverse range of objects. Additionally, we present a benchmark covering these challenging object classes, providing high-quality annotations for evaluating both tracking and reconstruction performance. Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.
comment: main contains 10 pages, 9 figures. And supplementary material contains 10 pages, 27 figures
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Vision-Language Navigation (VLN) is a critical task for developing embodied agents that can follow natural language instructions to navigate in complex real-world environments. Recent advances in VLN by large pretrained models have significantly improved generalization and instruction grounding compared to traditional approaches. However, the role of reasoning strategies in navigation-an action-centric, long-horizon task-remains underexplored, despite Chain-of-Thought (CoT) reasoning's demonstrated success in static tasks like visual question answering. To address this gap, we conduct the first systematic evaluation of reasoning strategies for VLN, including No-Think (direct action prediction), Pre-Think (reason before action), and Post-Think (reason after action). Surprisingly, our findings reveal the Inference-time Reasoning Collapse issue, where inference-time reasoning degrades navigation accuracy, highlighting the challenges of integrating reasoning into VLN. Based on this insight, we propose Aux-Think, a framework that trains models to internalize structured reasoning patterns through CoT supervision, while inferring action directly without reasoning in online prediction. To support this framework, we release R2R-CoT-320k, the first Chain-of-Thought annotated dataset for VLN. Extensive experiments show that Aux-Think reduces training effort greatly and achieves the best performance under the same data scale.
Experimental Study on Automatically Assembling Custom Catering Packages With a 3-DOF Delta Robot Using Deep Learning Methods
This paper introduces a pioneering experimental study on the automated packing of a catering package using a two-fingered gripper affixed to a 3-degree-of-freedom Delta parallel robot. A distinctive contribution lies in the application of a deep learning approach to tackle this challenge. A custom dataset, comprising 1,500 images, is meticulously curated for this endeavor, representing a noteworthy initiative as the first dataset focusing on Persian-manufactured products. The study employs the YOLOV5 model for object detection, followed by segmentation using the FastSAM model. Subsequently, rotation angle calculation is facilitated with segmentation masks, and a rotated rectangle encapsulating the object is generated. This rectangle forms the basis for calculating two grasp points using a novel geometrical approach involving eigenvectors. An extensive experimental study validates the proposed model, where all pertinent information is seamlessly transmitted to the 3-DOF Delta parallel robot. The proposed algorithm ensures real-time detection, calibration, and the fully autonomous packing process of a catering package, boasting an impressive over 80\% success rate in automatic grasping. This study marks a significant stride in advancing the capabilities of robotic systems for practical applications in packaging automation.
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
Learning manipulation skills from human demonstration videos offers a promising path toward generalizable and interpretable robotic intelligence-particularly through the lens of actionable affordances. However, transferring such knowledge remains challenging due to: 1) a lack of large-scale datasets with precise affordance annotations, and 2) insufficient exploration of affordances in diverse manipulation contexts. To address these gaps, we introduce HOVA-500K, a large-scale, affordance-annotated dataset comprising 500,000 images across 1,726 object categories and 675 actions. We also release a standardized benchmarking suite for multi-modal affordance reasoning. Built upon HOVA-500K, we present GLOVER++, a global-to-local affordance training framework that effectively transfers actionable affordance knowledge from human demonstrations to downstream open-vocabulary reasoning tasks. GLOVER++ achieves state-of-the-art results on the HOVA-500K benchmark and demonstrates strong generalization across diverse downstream robotic manipulation tasks. By explicitly modeling actionable affordances, GLOVER++ facilitates robust transfer across scenes, modalities, and tasks. We hope that HOVA-500K and the GLOVER++ framework will serve as valuable resources for bridging the gap between human demonstrations and robotic manipulation capabilities.
Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Object insertion under tight tolerances ($< \hspace{-.02in} 1mm$) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach.
PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter
In critical applications, including search-and-rescue in degraded environments, blockages can be prevalent and prevent the effective deployment of certain sensing modalities, particularly vision, due to occlusion and the constrained range of view of onboard camera sensors. To enable robots to tackle these challenges, we propose a new approach, Proprioceptive Obstacle Detection and Estimation while navigating in clutter PROBE, which instead relies only on the robot's proprioception to infer the presence or absence of occluded rectangular obstacles while predicting their dimensions and poses in SE(2). The proposed approach is a Transformer neural network that receives as input a history of applied torques and sensed whole-body movements of the robot and returns a parameterized representation of the obstacles in the environment. The effectiveness of PROBE is evaluated on simulated environments in Isaac Gym and with a real Unitree Go1 quadruped robot.
Bridging the Reality Gap in Digital Twins with Context-Aware, Physics-Guided Deep Learning SC
Digital twins (DTs) enable powerful predictive analytics, but persistent discrepancies between simulations and real systems--known as the reality gap--undermine their reliability. Coined in robotics, the term now applies to DTs, where discrepancies stem from context mismatches, cross-domain interactions, and multi-scale dynamics. Among these, context mismatch is pressing and underexplored, as DT accuracy depends on capturing operational context, often only partially observable. However, DTs have a key advantage: simulators can systematically vary contextual factors and explore scenarios difficult or impossible to observe empirically, informing inference and model alignment. While sim-to-real transfer like domain adaptation shows promise in robotics, their application to DTs poses two key challenges. First, unlike one-time policy transfers, DTs require continuous calibration across an asset's lifecycle--demanding structured information flow, timely detection of out-of-sync states, and integration of historical and new data. Second, DTs often perform inverse modeling, inferring latent states or faults from observations that may reflect multiple evolving contexts. These needs strain purely data-driven models and risk violating physical consistency. Though some approaches preserve validity via reduced-order model, most domain adaptation techniques still lack such constraints. To address this, we propose a Reality Gap Analysis (RGA) module for DTs that continuously integrates new sensor data, detects misalignments, and recalibrates DTs via a query-response framework. Our approach fuses domain-adversarial deep learning with reduced-order simulator guidance to improve context inference and preserve physical consistency. We illustrate the RGA module in a structural health monitoring case study on a steel truss bridge in Pittsburgh, PA, showing faster calibration and better real-world alignment.
comment: Submitted to ASCE Journal of Computing in Civil Engineering
Master Rules from Chaos: Learning to Reason, Plan, and Interact from Chaos for Tangram Assembly ICRA 2025
Tangram assembly, the art of human intelligence and manipulation dexterity, is a new challenge for robotics and reveals the limitations of state-of-the-arts. Here, we describe our initial exploration and highlight key problems in reasoning, planning, and manipulation for robotic tangram assembly. We present MRChaos (Master Rules from Chaos), a robust and general solution for learning assembly policies that can generalize to novel objects. In contrast to conventional methods based on prior geometric and kinematic models, MRChaos learns to assemble randomly generated objects through self-exploration in simulation without prior experience in assembling target objects. The reward signal is obtained from the visual observation change without manually designed models or annotations. MRChaos retains its robustness in assembling various novel tangram objects that have never been encountered during training, with only silhouette prompts. We show the potential of MRChaos in wider applications such as cutlery combinations. The presented work indicates that radical generalization in robotic assembly can be achieved by learning in much simpler domains.
comment: 7 pages, accepted by ICRA 2025
Human-Centered Development of Guide Dog Robots: Quiet and Stable Locomotion Control
A quadruped robot is a promising system that can offer assistance comparable to that of dog guides due to its similar form factor. However, various challenges remain in making these robots a reliable option for blind and low-vision (BLV) individuals. Among these challenges, noise and jerky motion during walking are critical drawbacks of existing quadruped robots. While these issues have largely been overlooked in guide dog robot research, our interviews with guide dog handlers and trainers revealed that acoustic and physical disturbances can be particularly disruptive for BLV individuals, who rely heavily on environmental sounds for navigation. To address these issues, we developed a novel walking controller for slow stepping and smooth foot swing/contact while maintaining human walking speed, as well as robust and stable balance control. The controller integrates with a perception system to facilitate locomotion over non-flat terrains, such as stairs. Our controller was extensively tested on the Unitree Go1 robot and, when compared with other control methods, demonstrated significant noise reduction -- half of the default locomotion controller. In this study, we adopt a mixed-methods approach to evaluate its usability with BLV individuals. In our indoor walking experiments, participants compared our controller to the robot's default controller. Results demonstrated superior acceptance of our controller, highlighting its potential to improve the user experience of guide dog robots. Video demonstration (best viewed with audio) available at: https://youtu.be/8-pz_8Hqe6s.
Gaussian Splatting as a Unified Representation for Autonomy in Unstructured Environments
In this work, we argue that Gaussian splatting is a suitable unified representation for autonomous robot navigation in large-scale unstructured outdoor environments. Such environments require representations that can capture complex structures while remaining computationally tractable for real-time navigation. We demonstrate that the dense geometric and photometric information provided by a Gaussian splatting representation is useful for navigation in unstructured environments. Additionally, semantic information can be embedded in the Gaussian map to enable large-scale task-driven navigation. From the lessons learned through our experiments, we highlight several challenges and opportunities arising from the use of such a representation for robot autonomy.
Learning IMU Bias with Diffusion Model ICRA 2025
Motion sensing and tracking with IMU data is essential for spatial intelligence, which however is challenging due to the presence of time-varying stochastic bias. IMU bias is affected by various factors such as temperature and vibration, making it highly complex and difficult to model analytically. Recent data-driven approaches using deep learning have shown promise in predicting bias from IMU readings. However, these methods often treat the task as a regression problem, overlooking the stochatic nature of bias. In contrast, we model bias, conditioned on IMU readings, as a probabilistic distribution and design a conditional diffusion model to approximate this distribution. Through this approach, we achieve improved performance and make predictions that align more closely with the known behavior of bias.
comment: accepted to ICRA 2025
AORRTC: Almost-Surely Asymptotically Optimal Planning with RRT-Connect
Finding high-quality solutions quickly is an important objective in motion planning. This is especially true for high-degree-of-freedom robots. Satisficing planners have traditionally found feasible solutions quickly but provide no guarantees on their optimality, while almost-surely asymptotically optimal (a.s.a.o.) planners have probabilistic guarantees on their convergence towards an optimal solution but are more computationally expensive. This paper uses the AO-x meta-algorithm to extend the satisficing RRT-Connect planner to optimal planning. The resulting Asymptotically Optimal RRT-Connect (AORRTC) finds initial solutions in similar times as RRT-Connect and uses any additional planning time to converge towards the optimal solution in an anytime manner. It is proven to be probabilistically complete and a.s.a.o. AORRTC was tested with the Panda (7 DoF) and Fetch (8 DoF) robotic arms on the MotionBenchMaker dataset. These experiments show that AORRTC finds initial solutions as fast as RRT-Connect and faster than the tested state-of-the-art a.s.a.o. algorithms while converging to better solutions faster. AORRTC finds solutions to difficult high-DoF planning problems in milliseconds where the other a.s.a.o. planners could not consistently find solutions in seconds. This performance was demonstrated both with and without single instruction/multiple data (SIMD) acceleration.
comment: Submitted to IEEE Robotics and Automation Letters (RA-L). Manuscript #25-1915. 8 pages, 4 figures, 1 table. A video of AORRTC can be found at https://www.youtube.com/watch?v=j1itxP3KuiM . Information on the implementation of AORRTC is available at https://robotic-esp.com/code/aorrtc/
Take Your Best Shot: Sampling-Based Next-Best-View Planning for Autonomous Photography & Inspection
Autonomous mobile robots (AMRs) equipped with high-quality cameras have revolutionized the field of inspections by providing efficient and cost-effective means of conducting surveys. The use of autonomous inspection is becoming more widespread in a variety of contexts, yet it is still challenging to acquire the best inspection information autonomously. In situations where objects may block a robot's view, it is necessary to use reasoning to determine the optimal points for collecting data. Although researchers have explored cloud-based applications to store inspection data, these applications may not operate optimally under network constraints, and parsing these datasets can be manually intensive. Instead, there is an emerging requirement for AMRs to autonomously capture the most informative views efficiently. To address this challenge, we present an autonomous Next-Best-View (NBV) framework that maximizes the inspection information while reducing the number of pictures needed during operations. The framework consists of a formalized evaluation metric using ray-tracing and Gaussian process interpolation to estimate information reward based on the current understanding of the partially-known environment. A derivative-free optimization (DFO) method is used to sample candidate views in the environment and identify the NBV point. The proposed approach's effectiveness is shown by comparing it with existing methods and further validated through simulations and experiments with various vehicles.
comment: For code and videos, see https://www.bezzorobotics.com/sg-lb-icra25
On the Surprising Effectiveness of Spectrum Clipping in Learning Stable Linear Dynamics
When learning stable linear dynamical systems from data, three important properties are desirable: i) predictive accuracy, ii) provable stability, and iii) computational efficiency. Unconstrained minimization of reconstruction errors leads to high accuracy and efficiency but cannot guarantee stability. Existing methods to enforce stability often preserve accuracy, but do so only at the cost of increased computation. In this work, we investigate if a straightforward approach can simultaneously offer all three desiderata of learning stable linear systems. Specifically, we consider a post-hoc approach that manipulates the spectrum of the learned system matrix that was computed using unconstrained least squares. We call this approach spectrum clipping (SC) as it involves eigen decomposition and subsequent reconstruction of the system matrix after clipping any eigenvalues that are larger than one to one (without altering the eigenvectors). Through comprehensive experiments involving two different applications and publicly available benchmark datasets, we show that this simple technique can efficiently learn highly-accurate linear systems that are provably-stable. Notably, we find that SC can match or outperform strong baselines while being orders-of-magnitude faster. We also show that SC can be readily combined with Koopman operators to learn stable nonlinear dynamics, such as those underlying complex dexterous manipulation skills involving multi-fingered robotic hands. Finally, we find that SC can learn stable robot policies even when the training data includes unsuccessful or truncated demonstrations. Our codes and dataset can be found at https://github.com/GT-STAR-Lab/spec_clip.
Task-Driven Co-Design of Mobile Manipulators
Recent interest in mobile manipulation has resulted in a wide range of new robot designs. A large family of these designs focuses on modular platforms that combine existing mobile bases with static manipulator arms. They combine these modules by mounting the arm in a tabletop configuration. However, the operating workspaces and heights for common mobile manipulation tasks, such as opening articulated objects, significantly differ from tabletop manipulation tasks. As a result, these standard arm mounting configurations can result in kinematics with restricted joint ranges and motions. To address these problems, we present the first Concurrent Design approach for mobile manipulators to optimize key arm-mounting parameters. Our approach directly targets task performance across representative household tasks by training a powerful multitask-capable reinforcement learning policy in an inner loop while optimizing over a distribution of design configurations guided by Bayesian Optimization and HyperBand (BOHB) in an outer loop. This results in novel designs that significantly improve performance across both seen and unseen test tasks, and outperform designs generated by heuristic-based performance indices that are cheaper to evaluate but only weakly correlated with the motions of interest. We evaluate the physical feasibility of the resulting designs and show that they are practical and remain modular, affordable, and compatible with existing commercial components. We open-source the approach and generated designs to facilitate further improvements of these platforms.
comment: Accepted for publication at RA-L. Project website: https://moma-codesign.cs.uni-freiburg.de/
Adaptive Reward Design for Reinforcement Learning UAI 2025
There is a surge of interest in using formal languages such as Linear Temporal Logic (LTL) to precisely and succinctly specify complex tasks and derive reward functions for Reinforcement Learning (RL). However, existing methods often assign sparse rewards (e.g., giving a reward of 1 only if a task is completed and 0 otherwise). By providing feedback solely upon task completion, these methods fail to encourage successful subtask completion. This is particularly problematic in environments with inherent uncertainty, where task completion may be unreliable despite progress on intermediate goals. To address this limitation, we propose a suite of reward functions that incentivize an RL agent to complete a task specified by an LTL formula as much as possible, and develop an adaptive reward shaping approach that dynamically updates reward functions during the learning process. Experimental results on a range of benchmark RL environments demonstrate that the proposed approach generally outperforms baselines, achieving earlier convergence to a better policy with higher expected return and task completion rate.
comment: UAI 2025 Camera Ready Version
Training Strategies for Efficient Embodied Reasoning
Robot chain-of-thought reasoning (CoT) -- wherein a model predicts helpful intermediate representations before choosing actions -- provides an effective method for improving the generalization and performance of robot policies, especially vision-language-action models (VLAs). While such approaches have been shown to improve performance and generalization, they suffer from core limitations, like needing specialized robot reasoning data and slow inference speeds. To design new robot reasoning approaches that address these issues, a more complete characterization of why reasoning helps policy performance is critical. We hypothesize several mechanisms by which robot reasoning improves policies -- (1) better representation learning, (2) improved learning curricularization, and (3) increased expressivity -- then devise simple variants of robot CoT reasoning to isolate and test each one. We find that learning to generate reasonings does lead to better VLA representations, while attending to the reasonings aids in actually leveraging these features for improved action prediction. Our results provide us with a better understanding of why CoT reasoning helps VLAs, which we use to introduce two simple and lightweight alternative recipes for robot reasoning. Our proposed approaches achieve significant performance gains over non-reasoning policies, state-of-the-art results on the LIBERO-90 benchmark, and a 3x inference speedup compared to standard robot reasoning.
comment: Updated figure layout, added project page link
Is Your Imitation Learning Policy Better than Mine? Policy Comparison with Near-Optimal Stopping
Imitation learning has enabled robots to perform complex, long-horizon tasks in challenging dexterous manipulation settings. As new methods are developed, they must be rigorously evaluated and compared against corresponding baselines through repeated evaluation trials. However, policy comparison is fundamentally constrained by a small feasible sample size (e.g., 10 or 50) due to significant human effort and limited inference throughput of policies. This paper proposes a novel statistical framework for rigorously comparing two policies in the small sample size regime. Prior work in statistical policy comparison relies on batch testing, which requires a fixed, pre-determined number of trials and lacks flexibility in adapting the sample size to the observed evaluation data. Furthermore, extending the test with additional trials risks inducing inadvertent p-hacking, undermining statistical assurances. In contrast, our proposed statistical test is sequential, allowing researchers to decide whether or not to run more trials based on intermediate results. This adaptively tailors the number of trials to the difficulty of the underlying comparison, saving significant time and effort without sacrificing probabilistic correctness. Extensive numerical simulation and real-world robot manipulation experiments show that our test achieves near-optimal stopping, letting researchers stop evaluation and make a decision in a near-minimal number of trials. Specifically, it reduces the number of evaluation trials by up to 32% as compared to state-of-the-art baselines, while preserving the probabilistic correctness and statistical power of the comparison. Moreover, our method is strongest in the most challenging comparison instances (requiring the most evaluation trials); in a multi-task comparison scenario, we save the evaluator more than 160 simulation rollouts.
comment: 14 + 5 pages, 10 figures, 4 tables. Accepted to RSS 2025
Your Learned Constraint is Secretly a Backward Reachable Tube
Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that both in theory and in practice, ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened. In the language of safe control, this means we recover a backwards reachable tube (BRT) rather than a failure set. In contrast to the failure set, the BRT depends on the dynamics of the data collection system. We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints.
comment: 14 pages, 4 figures
Pre-training Auto-regressive Robotic Models with 4D Representations
Foundation models pre-trained on massive unlabeled datasets have revolutionized natural language and computer vision, exhibiting remarkable generalization capabilities, thus highlighting the importance of pre-training. Yet, efforts in robotics have struggled to achieve similar success, limited by either the need for costly robotic annotations or the lack of representations that effectively model the physical world. In this paper, we introduce ARM4R, an Auto-regressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better pre-trained robotic model. Specifically, we focus on utilizing 3D point tracking representations from videos derived by lifting 2D representations into 3D space via monocular depth estimation across time. These 4D representations maintain a shared geometric structure between the points and robot state representations up to a linear transformation, enabling efficient transfer learning from human video data to low-level robotic control. Our experiments show that ARM4R can transfer efficiently from human video data to robotics and consistently improves performance on tasks across various robot environments and configurations.
MARFT: Multi-Agent Reinforcement Fine-Tuning
LLM-based Multi-Agent Systems have demonstrated remarkable capabilities in addressing complex, agentic tasks, from generating high-quality presentation slides to even conducting sophisticated scientific research. Meanwhile, RL has been widely recognized for its effectiveness in enhancing agent intelligence, but limited research has investigated the fine-tuning of LaMAS using foundational RL techniques. Moreover, the direct application of MARL methods to LaMAS introduces significant challenges, stemming from the unique characteristics and mechanisms inherent to LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes a novel paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a brand-new POMDP called Flex-POMDP, which aligns with the LaMAS optimization in real-world applications and a universal algorithmic framework tailored specifically for LaMAS, outlining the conceptual foundations, key distinctions, and practical implementation strategies. We review the evolution from RL to RFT, setting the stage for a parallel analysis in the multi-agent domain. In the context of LaMAS, we elucidate critical differences between MARL and MARFT. These differences motivate a transition toward a LaMAS-oriented formulation of RFT. Central to this work is a robust and scalable MARFT framework. We detail the core algorithm and provide a complete, open-source implementation to facilitate adoption and further research. The latter sections of the paper explore real-world application perspectives and opening challenges in MARFT. By bridging theoretical underpinnings with practical methodologies, this work serves as a roadmap for researchers seeking to advance MARFT toward resilient and adaptive solutions in agentic systems. Our implementation of the proposed framework is publicly available at: https://github.com/jwliao-ai/MARFT.
comment: 40 pages
CP-NCBF: A Conformal Prediction-based Approach to Synthesize Verified Neural Control Barrier Functions
Control Barrier Functions (CBFs) are a practical approach for designing safety-critical controllers, but constructing them for arbitrary nonlinear dynamical systems remains a challenge. Recent efforts have explored learning-based methods, such as neural CBFs (NCBFs), to address this issue. However, ensuring the validity of NCBFs is difficult due to potential learning errors. In this letter, we propose a novel framework that leverages split-conformal prediction to generate formally verified neural CBFs with probabilistic guarantees based on a user-defined error rate, referred to as CP-NCBF. Unlike existing methods that impose Lipschitz constraints on neural CBF-leading to scalability limitations and overly conservative safe sets--our approach is sample-efficient, scalable, and results in less restrictive safety regions. We validate our framework through case studies on obstacle avoidance in autonomous driving and geo-fencing of aerial vehicles, demonstrating its ability to generate larger and less conservative safe sets compared to conventional techniques.
comment: 17 Pages, 10 Figures. First two authors have contributed equally
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM
PDDL-based symbolic task planning remains pivotal for robot autonomy yet struggles with dynamic human-robot collaboration due to scalability, re-planning demands, and delayed plan availability. Although a few neurosymbolic frameworks have previously leveraged LLMs such as GPT-3 to address these challenges, reliance on closed-source, remote models with limited context introduced critical constraints: third-party dependency, inconsistent response times, restricted plan length and complexity, and multi-domain scalability issues. We present Gideon, a novel framework that enables the transition to modern, smaller, local LLMs with extended context length. Gideon integrates a novel problem generator to systematically generate large-scale datasets of realistic domain-problem-plan tuples for any domain, and adapts neurosymbolic planning for local LLMs, enabling on-device execution and extended context for multi-domain support. Preliminary experiments in single-domain scenarios performed on Qwen-2.5 1.5B and trained on 8k-32k samples, demonstrate a valid plan percentage of 66.1% (32k model) and show that the figure can be further scaled through additional data. Multi-domain tests on 16k samples yield an even higher 70.6% planning validity rate, proving extensibility across domains and signaling that data variety can have a positive effect on learning efficiency. Although long-horizon planning and reduced model size make Gideon training much less efficient than baseline models based on larger LLMs, the results are still significant considering that the trained model is about 120x smaller than baseline and that significant advantages can be achieved in inference efficiency, scalability, and multi-domain adaptability, all critical factors in human-robot collaboration. Training inefficiency can be mitigated by Gideon's streamlined data generation pipeline.
comment: 18 pages, 3 figures, 4 tables, accepted at IAS 2025
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
Robot sports, characterized by well-defined objectives, explicit rules, and dynamic interactions, present ideal scenarios for demonstrating embodied intelligence. In this paper, we present VolleyBots, a novel robot sports testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots integrates three features within a unified platform: competitive and cooperative gameplay, turn-based interaction structure, and agile 3D maneuvering. Competitive and cooperative gameplay challenges each drone to coordinate with its teammates while anticipating and countering opposing teams' tactics. Turn-based interaction demands precise timing, accurate state prediction, and management of long-horizon temporal dependencies. Agile 3D maneuvering requires rapid accelerations, sharp turns, and precise 3D positioning despite the quadrotor's underactuated dynamics. These intertwined features yield a complex problem combining motion control and strategic play, with no available expert demonstrations. We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative multi-agent reinforcement learning (MARL) and game-theoretic algorithms. Simulation results show that on-policy reinforcement learning (RL) methods outperform off-policy methods in single-agent tasks, but both approaches struggle in complex tasks that combine motion control and strategic play. We additionally design a hierarchical policy which achieves a 69.5% percent win rate against the strongest baseline in the 3 vs 3 task, underscoring its potential as an effective solution for tackling the complex interplay between low-level control and high-level strategy. The project page is at https://sites.google.com/view/thu-volleybots.
Dexterous Manipulation through Imitation Learning: A Survey
Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to operate in complex and unstructured environments. Traditional model-based approaches struggle to generalize across tasks and object variations due to the high dimensionality and complex contact dynamics of dexterous manipulation. Although model-free methods such as reinforcement learning (RL) show promise, they require extensive training, large-scale interaction data, and carefully designed rewards for stability and effectiveness. Imitation learning (IL) offers an alternative by allowing robots to acquire dexterous manipulation skills directly from expert demonstrations, capturing fine-grained coordination and contact dynamics while bypassing the need for explicit modeling and large-scale trial-and-error. This survey provides an overview of dexterous manipulation methods based on imitation learning, details recent advances, and addresses key challenges in the field. Additionally, it explores potential research directions to enhance IL-driven dexterous manipulation. Our goal is to offer researchers and practitioners a comprehensive introduction to this rapidly evolving domain.
comment: 22pages, 5 figures
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
Offline reinforcement learning (RL) enables policy optimization using static datasets, avoiding the risks and costs of extensive real-world exploration. However, it struggles with suboptimal offline behaviors and inaccurate value estimation due to the lack of environmental interaction. We present Video-Enhanced Offline RL (VeoRL), a model-based method that constructs an interactive world model from diverse, unlabeled video data readily available online. Leveraging model-based behavior guidance, our approach transfers commonsense knowledge of control policy and physical dynamics from natural videos to the RL agent within the target domain. VeoRL achieves substantial performance gains (over 100% in some cases) across visual control tasks in robotic manipulation, autonomous driving, and open-world video games.
Knowledge-Informed Multi-Agent Trajectory Prediction at Signalized Intersections for Infrastructure-to-Everything
Multi-agent trajectory prediction at signalized intersections is crucial for developing efficient intelligent transportation systems and safe autonomous driving systems. Due to the complexity of intersection scenarios and the limitations of single-vehicle perception, the performance of vehicle-centric prediction methods has reached a plateau. In this paper, we introduce an Infrastructure-to-Everything (I2X) collaborative prediction scheme. In this scheme, roadside units (RSUs) independently forecast the future trajectories of all vehicles and transmit these predictions unidirectionally to subscribing vehicles. Building on this scheme, we propose I2XTraj, a dedicated infrastructure-based trajectory prediction model. I2XTraj leverages real-time traffic signal states, prior maneuver strategy knowledge, and multi-agent interactions to generate accurate, joint multi-modal trajectory prediction. First, a continuous signal-informed mechanism is proposed to adaptively process real-time traffic signals to guide trajectory proposal generation under varied intersection configurations. Second, a driving strategy awareness mechanism estimates the joint distribution of maneuver strategies by integrating spatial priors of intersection areas with dynamic vehicle states, enabling coverage of the full set of feasible maneuvers. Third, a spatial-temporal-mode attention network models multi-agent interactions to refine and adjust joint trajectory outputs.Finally, I2XTraj is evaluated on two real-world datasets of signalized intersections, the V2X-Seq and the SinD drone dataset. In both single-infrastructure and online collaborative scenarios, our model outperforms state-of-the-art methods by over 30\% on V2X-Seq and 15\% on SinD, demonstrating strong generalizability and robustness.
Path-Tracking Hybrid A* and Hierarchical MPC Framework for Autonomous Agricultural Vehicles
We propose a Path-Tracking Hybrid A* planner coupled with a hierarchical Model Predictive Control (MPC) framework for path smoothing in agricultural vehicles. The goal is to minimize deviation from reference paths during cross-furrow operations, thereby optimizing operational efficiency, preventing crop and soil damage, while also enforcing curvature constraints and ensuring full-body collision avoidance. Our contributions are threefold: (1) We develop the Path-Tracking Hybrid A* algorithm to generate smooth trajectories that closely adhere to the reference trajectory, respect strict curvature constraints, and satisfy full-body collision avoidance. The adherence is achieved by designing novel cost and heuristic functions to minimize tracking errors under nonholonomic constraints. (2) We introduce an online replanning strategy as an extension that enables real-time avoidance of unforeseen obstacles, while leveraging pruning techniques to enhance computational efficiency. (3) We design a hierarchical MPC framework that ensures tight path adherence and real-time satisfaction of vehicle constraints, including nonholonomic dynamics and full-body collision avoidance. By using linearized MPC to warm-start the nonlinear solver, the framework improves the convergence of nonlinear optimization with minimal loss in accuracy. Simulations on real-world farm datasets demonstrate superior performance compared to baseline methods in safety, path adherence, computation speed, and real-time obstacle avoidance.
Robotic Shepherding in Cluttered and Unknown Environments using Control Barrier Functions
This paper introduces a novel control methodology designed to guide a collective of robotic-sheep in a cluttered and unknown environment using robotic-dogs. The dog-agents continuously scan the environment and compute a safe trajectory to guide the sheep to their final destination. The proposed optimization-based controller guarantees that the sheep reside within a desired distance from the reference trajectory through the use of Control Barrier Functions (CBF). Additional CBF constraints are employed simultaneously to ensure inter-agent and obstacle collision avoidance. The efficacy of the proposed approach is rigorously tested in simulation, which demonstrates the successful herding of the robotic-sheep within complex and cluttered environments.
VISTA: Generative Visual Imagination for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) tasks agents with locating specific objects in unseen environments using natural language instructions and visual cues. Many existing VLN approaches typically follow an 'observe-and-reason' schema, that is, agents observe the environment and decide on the next action to take based on the visual observations of their surroundings. They often face challenges in long-horizon scenarios due to limitations in immediate observation and vision-language modality gaps. To overcome this, we present VISTA, a novel framework that employs an 'imagine-and-align' navigation strategy. Specifically, we leverage the generative prior of pre-trained diffusion models for dynamic visual imagination conditioned on both local observations and high-level language instructions. A Perceptual Alignment Filter module then grounds these goal imaginations against current observations, guiding an interpretable and structured reasoning process for action selection. Experiments show that VISTA sets new state-of-the-art results on Room-to-Room (R2R) and RoboTHOR benchmarks, e.g.,+3.6% increase in Success Rate on R2R. Extensive ablation analysis underscores the value of integrating forward-looking imagination, perceptual alignment, and structured reasoning for robust navigation in long-horizon environments.
comment: 13 pages, 5 figures
Multiagent Systems
Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners
Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism with monetary transfers that is budget balanced and enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FeMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees, and better model performance for all agents.
comment: 19 pages, 12 figures, under review
Interactional Fairness in LLM Multi-Agent Systems: An Evaluation Framework
As large language models (LLMs) are increasingly used in multi-agent systems, questions of fairness should extend beyond resource distribution and procedural design to include the fairness of how agents communicate. Drawing from organizational psychology, we introduce a novel framework for evaluating Interactional fairness encompassing Interpersonal fairness (IF) and Informational fairness (InfF) in LLM-based multi-agent systems (LLM-MAS). We extend the theoretical grounding of Interactional Fairness to non-sentient agents, reframing fairness as a socially interpretable signal rather than a subjective experience. We then adapt established tools from organizational justice research, including Colquitt's Organizational Justice Scale and the Critical Incident Technique, to measure fairness as a behavioral property of agent interaction. We validate our framework through a pilot study using controlled simulations of a resource negotiation task. We systematically manipulate tone, explanation quality, outcome inequality, and task framing (collaborative vs. competitive) to assess how IF influences agent behavior. Results show that tone and justification quality significantly affect acceptance decisions even when objective outcomes are held constant. In addition, the influence of IF vs. InfF varies with context. This work lays the foundation for fairness auditing and norm-sensitive alignment in LLM-MAS.
Modèles de Substitution pour les Modèles à base d'Agents : Enjeux, Méthodes et Applications
Multi-agent simulations enables the modeling and analyses of the dynamic behaviors and interactions of autonomous entities evolving in complex environments. Agent-based models (ABM) are widely used to study emergent phenomena arising from local interactions. However, their high computational cost poses a significant challenge, particularly for large-scale simulations requiring extensive parameter exploration, optimization, or uncertainty quantification. The increasing complexity of ABM limits their feasibility for real-time decision-making and large-scale scenario analysis. To address these limitations, surrogate models offer an efficient alternative by learning approximations from sparse simulation data. These models provide cheap-to-evaluate predictions, significantly reducing computational costs while maintaining accuracy. Various machine learning techniques, including regression models, neural networks, random forests and Gaussian processes, have been applied to construct robust surrogates. Moreover, uncertainty quantification and sensitivity analysis play a crucial role in enhancing model reliability and interpretability. This article explores the motivations, methods, and applications of surrogate modeling for ABM, emphasizing the trade-offs between accuracy, computational efficiency, and interpretability. Through a case study on a segregation model, we highlight the challenges associated with building and validating surrogate models, comparing different approaches and evaluating their performance. Finally, we discuss future perspectives on integrating surrogate models within ABM to improve scalability, explainability, and real-time decision support across various fields such as ecology, urban planning and economics.
comment: 12 pages, in French language. Les 33\`emes Journ\'ees Francophones sur les Syst\`emes Multi-Agents (JFSMA 2025). 2025
OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration
Agents powered by advanced large language models (LLMs) have demonstrated impressive capabilities across diverse complex applications. Recently, Multi-Agent Systems (MAS), wherein multiple agents collaborate and communicate with each other, have exhibited enhanced capabilities in complex tasks, such as high-quality code generation and arithmetic reasoning. However, the development of such systems often relies on handcrafted methods, and the literature on systematic design and optimization of LLM-based MAS remains limited. In this work, we introduce OMAC, a general framework designed for holistic optimization of LLM-based MAS. Specifically, we identify five key optimization dimensions for MAS, encompassing both agent functionality and collaboration structure. Building upon these dimensions, we first propose a general algorithm, utilizing two actors termed the Semantic Initializer and the Contrastive Comparator, to optimize any single dimension. Then, we present an algorithm for joint optimization across multiple dimensions. Extensive experiments demonstrate the superior performance of OMAC on code generation, arithmetic reasoning, and general reasoning tasks against state-of-the-art approaches.
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems
Recent advancements in Multi-Agent Systems (MAS) powered by Large Language Models (LLMs) have demonstrated tremendous potential in diverse task scenarios. Nonetheless, existing agentic systems typically rely on predefined agent-role design spaces and static communication structures, limiting their adaptability as well as flexibility in complex interaction environments and leading to subpar performance on highly specialized and expert-level tasks. To address these issues, we introduce HALO, a multi-agent collaboration framework based on a hierarchical reasoning architecture. Specifically, we incorporate a high-level planning agent for task decomposition, mid-level role-design agents for subtask-specific agent instantiation, and low-level inference agents for subtask execution. Particularly, subtask execution is reformulated as a structured workflow search problem, where Monte Carlo Tree Search (MCTS) systematically explores the agentic action space to construct optimal reasoning trajectories. Additionally, as the majority of users lack expertise in prompt engineering, we leverage an Adaptive Prompt Refinement module to transform raw queries into task-specific prompts. Empirical evaluations on Code Generation (HumanEval), General Reasoning (MMLU), and Arithmetic Reasoning (MATH) benchmark datasets highlight the effectiveness of HALO, yielding a 14.4% average improvement over state-of-the-art baselines. Notably, HALO achieves up to 13.3% performance gain on the Moral Scenarios subject in the MMLU benchmark and up to 19.6% performance gain on the Algebra subarea in the MATH benchmark, indicating its advanced proficiency in tackling highly specialized and expert-level tasks. The code repository is available at https://github.com/23japhone/HALO.
comment: The code repository is available at https://github.com/23japhone/HALO
Benchmarking LLMs' Swarm intelligence
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks (Pursuit, Synchronization, Foraging, Flocking, Transport) within a configurable 2D grid environment, forcing agents to rely solely on local sensory input ($k\times k$ view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Zero-shot evaluations of leading LLMs (e.g., deepseek-v3, o4-mini) reveal significant task-dependent performance variations. While some rudimentary coordination is observed, our results indicate that current LLMs significantly struggle with robust long-range planning and adaptive strategy formation under the uncertainty inherent in these decentralized scenarios. Assessing LLMs under such swarm-like constraints is crucial for understanding their utility in future decentralized intelligent systems. We release SwarmBench as an open, extensible toolkit-built on a customizable physical system-providing environments, prompts, evaluation scripts, and comprehensive datasets. This aims to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of emergent collective behavior under severe informational decentralization. Our code repository is available at https://github.com/x66ccff/swarmbench.
MARFT: Multi-Agent Reinforcement Fine-Tuning
LLM-based Multi-Agent Systems have demonstrated remarkable capabilities in addressing complex, agentic tasks, from generating high-quality presentation slides to even conducting sophisticated scientific research. Meanwhile, RL has been widely recognized for its effectiveness in enhancing agent intelligence, but limited research has investigated the fine-tuning of LaMAS using foundational RL techniques. Moreover, the direct application of MARL methods to LaMAS introduces significant challenges, stemming from the unique characteristics and mechanisms inherent to LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes a novel paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a brand-new POMDP called Flex-POMDP, which aligns with the LaMAS optimization in real-world applications and a universal algorithmic framework tailored specifically for LaMAS, outlining the conceptual foundations, key distinctions, and practical implementation strategies. We review the evolution from RL to RFT, setting the stage for a parallel analysis in the multi-agent domain. In the context of LaMAS, we elucidate critical differences between MARL and MARFT. These differences motivate a transition toward a LaMAS-oriented formulation of RFT. Central to this work is a robust and scalable MARFT framework. We detail the core algorithm and provide a complete, open-source implementation to facilitate adoption and further research. The latter sections of the paper explore real-world application perspectives and opening challenges in MARFT. By bridging theoretical underpinnings with practical methodologies, this work serves as a roadmap for researchers seeking to advance MARFT toward resilient and adaptive solutions in agentic systems. Our implementation of the proposed framework is publicly available at: https://github.com/jwliao-ai/MARFT.
comment: 40 pages
Quantifying the Self-Interest Level of Markov Social Dilemmas
This paper introduces a novel method for estimating the self-interest level of Markov social dilemmas. We extend the concept of self-interest level from normal-form games to Markov games, providing a quantitative measure of the minimum reward exchange required to align individual and collective interests. We demonstrate our method on three environments from the Melting Pot suite, representing either common-pool resources or public goods. Our results illustrate how reward exchange can enable agents to transition from selfish to collective equilibria in a Markov social dilemma. This work contributes to multi-agent reinforcement learning by providing a practical tool for analysing complex, multistep social dilemmas. Our findings offer insights into how reward structures can promote or hinder cooperation, with potential applications in areas such as mechanism design.
comment: 7 pages (9 with references), 11 figures
Knowledge-Informed Multi-Agent Trajectory Prediction at Signalized Intersections for Infrastructure-to-Everything
Multi-agent trajectory prediction at signalized intersections is crucial for developing efficient intelligent transportation systems and safe autonomous driving systems. Due to the complexity of intersection scenarios and the limitations of single-vehicle perception, the performance of vehicle-centric prediction methods has reached a plateau. In this paper, we introduce an Infrastructure-to-Everything (I2X) collaborative prediction scheme. In this scheme, roadside units (RSUs) independently forecast the future trajectories of all vehicles and transmit these predictions unidirectionally to subscribing vehicles. Building on this scheme, we propose I2XTraj, a dedicated infrastructure-based trajectory prediction model. I2XTraj leverages real-time traffic signal states, prior maneuver strategy knowledge, and multi-agent interactions to generate accurate, joint multi-modal trajectory prediction. First, a continuous signal-informed mechanism is proposed to adaptively process real-time traffic signals to guide trajectory proposal generation under varied intersection configurations. Second, a driving strategy awareness mechanism estimates the joint distribution of maneuver strategies by integrating spatial priors of intersection areas with dynamic vehicle states, enabling coverage of the full set of feasible maneuvers. Third, a spatial-temporal-mode attention network models multi-agent interactions to refine and adjust joint trajectory outputs.Finally, I2XTraj is evaluated on two real-world datasets of signalized intersections, the V2X-Seq and the SinD drone dataset. In both single-infrastructure and online collaborative scenarios, our model outperforms state-of-the-art methods by over 30\% on V2X-Seq and 15\% on SinD, demonstrating strong generalizability and robustness.
Ranking Joint Policies in Dynamic Games using Evolutionary Dynamics
Game-theoretic solution concepts, such as the Nash equilibrium, have been key to finding stable joint actions in multi-player games. However, it has been shown that the dynamics of agents' interactions, even in simple two-player games with few strategies, are incapable of reaching Nash equilibria, exhibiting complex and unpredictable behavior. Instead, evolutionary approaches can describe the long-term persistence of strategies and filter out transient ones, accounting for the long-term dynamics of agents' interactions. Our goal is to identify agents' joint strategies that result in stable behavior, being resistant to changes, while also accounting for agents' payoffs, in dynamic games. Towards this goal, and building on previous results, this paper proposes transforming dynamic games into their empirical forms by considering agents' strategies instead of agents' actions, and applying the evolutionary methodology $\alpha$-Rank to evaluate and rank strategy profiles according to their long-term dynamics. This methodology not only allows us to identify joint strategies that are strong through agents' long-term interactions, but also provides a descriptive, transparent framework regarding the high ranking of these strategies. Experiments report on agents that aim to collaboratively solve a stochastic version of the graph coloring problem. We consider different styles of play as strategies to define the empirical game, and train policies realizing these strategies, using the DQN algorithm. Then we run simulations to generate the payoff matrix required by $\alpha$-Rank to rank joint strategies.
Robotic Shepherding in Cluttered and Unknown Environments using Control Barrier Functions
This paper introduces a novel control methodology designed to guide a collective of robotic-sheep in a cluttered and unknown environment using robotic-dogs. The dog-agents continuously scan the environment and compute a safe trajectory to guide the sheep to their final destination. The proposed optimization-based controller guarantees that the sheep reside within a desired distance from the reference trajectory through the use of Control Barrier Functions (CBF). Additional CBF constraints are employed simultaneously to ensure inter-agent and obstacle collision avoidance. The efficacy of the proposed approach is rigorously tested in simulation, which demonstrates the successful herding of the robotic-sheep within complex and cluttered environments.
Systems and Control (CS)
WaLRUS: Wavelets for Long-range Representation Using SSMs
State-Space Models (SSMs) have proven to be powerful tools for modeling long-range dependencies in sequential data. While the recent method known as HiPPO has demonstrated strong performance, and formed the basis for machine learning models S4 and Mamba, it remains limited by its reliance on closed-form solutions for a few specific, well-behaved bases. The SaFARi framework generalized this approach, enabling the construction of SSMs from arbitrary frames, including non-orthogonal and redundant ones, thus allowing an infinite diversity of possible "species" within the SSM family. In this paper, we introduce WaLRUS (Wavelets for Long-range Representation Using SSMs), a new implementation of SaFARi built from Daubechies wavelets.
comment: 15 pages, 8 figures. Submitted to Neurips 2025
Optimal Satellite Maneuvers for Spaceborne Jamming Attacks
Satellites are becoming exceedingly critical for communication, making them prime targets for cyber-physical attacks. We consider a rogue satellite in low Earth orbit that jams the uplink communication between another satellite and a ground station. To achieve maximal interference with minimal fuel consumption, the jammer carefully maneuvers itself relative to the target satellite's antenna. We cast this maneuvering objective as a two-stage optimal control problem, involving i) repositioning to an efficient jamming position before uplink communication commences; and ii) maintaining an efficient jamming position after communication has started. We obtain the optimal maneuvering trajectories for the jammer and perform simulations to show how they enable the disruption of uplink communication with reasonable fuel consumption.
Particle Filtering for Enhanced Parameter Estimation in Bilinear Systems Under Colored Noise
This paper addresses the challenging problem of parameter estimation in bilinear systems under colored noise. A novel approach, termed B-PF-RLS, is proposed, combining a particle filter (PF) with a recursive least squares (RLS) estimator. The B-PF-RLS algorithm tackles the complexities arising from system nonlinearities and colored noise by effectively estimating unknown system states using the particle filter, which are then integrated into the RLS parameter estimation process. Furthermore, the paper introduces an enhanced particle filter that eliminates the need for explicit knowledge of the measurement noise variance, enhancing the method's practicality for real-world applications. Numerical examples demonstrate the B-PF-RLS algorithm's superior performance in accurately estimating both system parameters and states, even under uncertain noise conditions. This work offers a robust and effective solution for system identification in various engineering applications involving bilinear models subject to complex noise environments.
Reach-avoid games for players with damped double integrator dynamics
This paper studies a reach-avoid game of two damped double integrator players. An attacker aims to reach a static target, while a faster defender tries to protect the target by intercepting the attacker before it reaches the target. In scenarios where the defender succeeds, the defender aims to maximize the attacker's final distance from the target, while the attacker aims to minimize it. This work focuses on determining the equilibrium strategy in the defender-winning scenarios. The optimal state feedback strategy is obtained by a differential game approach combining geometric analysis. We construct a multiple reachable region to analyse the damped double integrator player's motion under optimal strategy. Building on this, a new type of the attacker's dominance region is introduced for the first time. It is shown that different strategies are required when the terminal point lies in distinct areas of the attacker's dominance region. Then, a necessary condition is derived for the proposed strategy to be optimal using differential game approach. Furthermore, a case where both players start at rest is discussed, and some useful properties about the dominance region and the optimal strategy are presented. Simulations are conducted to show the effectiveness of the proposed strategy.
PyScrew: A Comprehensive Dataset Collection from Industrial Screw Driving Experiments
This paper presents a comprehensive collection of industrial screw driving datasets designed to advance research in manufacturing process monitoring and quality control. The collection comprises six distinct datasets with over 34,000 individual screw driving operations conducted under controlled experimental conditions, capturing the multifaceted nature of screw driving processes in plastic components. Each dataset systematically investigates specific aspects: natural thread degradation patterns through repeated use (s01), variations in surface friction conditions including contamination and surface treatments (s02), diverse assembly faults with up to 27 error types (s03-s04), and fabrication parameter variations in both upper and lower workpieces through modified injection molding settings (s05-s06). We detail the standardized experimental setup used across all datasets, including hardware specifications, process phases, and data acquisition methods. The hierarchical data model preserves the temporal and operational structure of screw driving processes, facilitating both exploratory analysis and the development of machine learning models. To maximize accessibility, we provide dual access pathways: raw data through Zenodo with a persistent DOI, and a purpose-built Python library (PyScrew) that offers consistent interfaces for data loading, preprocessing, and integration with common analysis workflows. These datasets serve diverse research applications including anomaly detection, predictive maintenance, quality control system development, feature extraction methodology evaluation, and classification of specific error conditions. By addressing the scarcity of standardized, comprehensive datasets in industrial manufacturing, this collection enables reproducible research and fair comparison of analytical approaches in an area of growing importance for industrial automation.
comment: 18 pages, 2 figures, 7 tables
Model-free Dynamic Mode Adaptive Control using Matrix RLS
This paper presents a novel, model-free, data-driven control synthesis technique known as dynamic mode adaptive control (DMAC) for synthesizing controllers for complex systems whose mathematical models are not suitable for classical control design. DMAC consists of a dynamics approximation module and a controller module. The dynamics approximation module is motivated by data-driven reduced-order modeling techniques and directly approximates the system's dynamics in state-space form using a matrix version of the recursive least squares algorithm. The controller module includes an output tracking controller that utilizes sparse measurements from the system to generate the control signal. The DMAC controller design technique is demonstrated through various dynamic systems commonly found in engineering applications. A systematic sensitivity study demonstrates the robustness of DMAC with respect to its own hyperparameters and the system's parameters.
Robustness of Incentive Mechanisms Against System Misspecification in Congestion Games
To steer the behavior of selfish, resource-sharing agents in a socio-technical system towards the direction of higher efficiency, the system designer requires accurate models of both agent behaviors and the underlying system infrastructure. For instance, traffic controllers often use road latency models to design tolls whose deployment can effectively mitigate traffic congestion. However, misspecifications of system parameters may restrict a system designer's ability to influence collective agent behavior toward efficient outcomes. In this work, we study the impact of system misspecifications on toll design for atomic congestion games. We prove that tolls designed under sufficiently minor system misspecifications, when deployed, do not introduce new Nash equilibria in atomic congestion games compared to tolls designed in the noise-free setting, implying a form of local robustness. We then upper bound the degree to which the worst-case equilibrium system performance could decrease when tolls designed under a given level of system misspecification are deployed. We validate our theoretical results via Monte-Carlo simulations as well as realizations of our worst-case guarantees.
comment: 6 pages, 3 figures
Stability and Convergence Analysis of Multi-Agent Consensus with Communication Delays: A Lambert W Function Approach
This paper investigates the effect of constant time delay in weakly connected multi-agent systems modeled by double integrator dynamics. A novel analytical approach is proposed to establish an upper bound on the permissible time delay that ensures stability and consensus convergence. The analysis employs the Lambert W function method in higher-dimensional systems to derive explicit conditions under which consensus is achieved. The theoretical results are rigorously proven and provide insight into the allowable delay margins. The analysis applies to general leaderless undirected network topologies. The framework also accounts for complex and realistic delays, including non-commensurate communication delays. Numerical examples are provided to demonstrate the effectiveness of the proposed method.
comment: short version submitted to IEEE L-CSS and CDC 2025
Leveraging Reinforcement Learning and Koopman Theory for Enhanced Model Predictive Control Performance
This study presents an innovative approach to Model Predictive Control (MPC) by leveraging the powerful combination of Koopman theory and Deep Reinforcement Learning (DRL). By transforming nonlinear dynamical systems into a higher-dimensional linear regime, the Koopman operator facilitates the linear treatment of nonlinear behaviors, paving the way for more efficient control strategies. Our methodology harnesses the predictive prowess of Koopman-based models alongside the optimization capabilities of DRL, particularly using the Proximal Policy Optimization (PPO) algorithm, to enhance the controller's performance. The resulting end-to-end learning framework refines the predictive control policies to cater to specific operational tasks, optimizing both performance and economic efficiency. We validate our approach through rigorous NMPC and eNMPC case studies, demonstrating that the Koopman-RL controller outperforms traditional controllers by achieving higher stability, superior constraint satisfaction, and significant cost savings. The findings indicate that our model can be a robust tool for complex control tasks, offering valuable insights into future applications of RL in MPC.
comment: 21 pages, 8 figures, 2 author photos. Affiliations: Electrical Engineering Department, Penn State University. Keywords: Economic Nonlinear Model Predictive Control, (e)NMPC, Reinforcement Learning, Surrogate Models, System Identification
On the Surprising Effectiveness of Spectrum Clipping in Learning Stable Linear Dynamics
When learning stable linear dynamical systems from data, three important properties are desirable: i) predictive accuracy, ii) provable stability, and iii) computational efficiency. Unconstrained minimization of reconstruction errors leads to high accuracy and efficiency but cannot guarantee stability. Existing methods to enforce stability often preserve accuracy, but do so only at the cost of increased computation. In this work, we investigate if a straightforward approach can simultaneously offer all three desiderata of learning stable linear systems. Specifically, we consider a post-hoc approach that manipulates the spectrum of the learned system matrix that was computed using unconstrained least squares. We call this approach spectrum clipping (SC) as it involves eigen decomposition and subsequent reconstruction of the system matrix after clipping any eigenvalues that are larger than one to one (without altering the eigenvectors). Through comprehensive experiments involving two different applications and publicly available benchmark datasets, we show that this simple technique can efficiently learn highly-accurate linear systems that are provably-stable. Notably, we find that SC can match or outperform strong baselines while being orders-of-magnitude faster. We also show that SC can be readily combined with Koopman operators to learn stable nonlinear dynamics, such as those underlying complex dexterous manipulation skills involving multi-fingered robotic hands. Finally, we find that SC can learn stable robot policies even when the training data includes unsuccessful or truncated demonstrations. Our codes and dataset can be found at https://github.com/GT-STAR-Lab/spec_clip.
Adaptive Reinforcement Learning for State Avoidance in Discrete Event Systems
Reinforcement learning (RL) has emerged as a potent paradigm for autonomous decision-making in complex environments. However, the integration of event-driven decision processes within RL remains a challenge. This paper presents a novel architecture that combines a Discrete Event Supervisory (DES) model with a standard RL framework to create a hybrid decision-making system. Our model leverages the DES's capabilities in managing event-based dynamics with the RL agent's adaptability to continuous states and actions, facilitating a more robust and flexible control strategy in systems characterized by both continuous and discrete events. The DES model operates alongside the RL agent, enhancing the policy's performance with event-based insights, while the environment's state transitions are governed by a mechanistic model. We demonstrate the efficacy of our approach through simulations that show improved performance metrics over traditional RL implementations. Our results suggest that this integrated approach holds promise for applications ranging from industrial automation to intelligent traffic systems, where discrete event handling is paramount.
comment: 8 pages, 5 figures. Preprint version
(Corrected Version) Push-LSVRG-UP: Distributed Stochastic Optimization over Unbalanced Directed Networks with Uncoordinated Triggered Probabilities
Distributed stochastic optimization, arising in the crossing and integration of traditional stochastic optimization, distributed computing and storage, and network science, has advantages of high efficiency and a low per-iteration computational complexity in resolving large-scale optimization problems. This paper concentrates on resolving a large-scale convex finite-sum optimization problem in a multi-agent system over unbalanced directed networks. To tackle this problem in an efficient way, a distributed consensus optimization algorithm, adopting the push-sum technique and a distributed loopless stochastic variance-reduced gradient (LSVRG) method with uncoordinated triggered probabilities, is developed and named Push-LSVRG-UP. Each agent under this algorithmic framework performs only local computation and communicates only with its neighbors without leaking their private information. The convergence analysis of Push-LSVRG-UP is relied on analyzing the contraction relationships between four error terms associated with the multi-agent system. Theoretical results provide an explicit feasible range of the constant step-size, a linear convergence rate, and an iteration complexity of Push-LSVRG-UP when achieving the globally optimal solution. It is shown that Push-LSVRG-UP achieves the superior characteristics of accelerated linear convergence, fewer storage costs, and a lower per-iteration computational complexity than most existing works. Meanwhile, the introduction of an uncoordinated probabilistic triggered mechanism allows Push-LSVRG-UP to facilitate the independence and flexibility of agents in computing local batch gradients. In simulations, the practicability and improved performance of Push-LSVRG-UP are manifested via resolving two distributed learning problems based on real-world datasets.
comment: 16 pages, 30 figures
CP-NCBF: A Conformal Prediction-based Approach to Synthesize Verified Neural Control Barrier Functions
Control Barrier Functions (CBFs) are a practical approach for designing safety-critical controllers, but constructing them for arbitrary nonlinear dynamical systems remains a challenge. Recent efforts have explored learning-based methods, such as neural CBFs (NCBFs), to address this issue. However, ensuring the validity of NCBFs is difficult due to potential learning errors. In this letter, we propose a novel framework that leverages split-conformal prediction to generate formally verified neural CBFs with probabilistic guarantees based on a user-defined error rate, referred to as CP-NCBF. Unlike existing methods that impose Lipschitz constraints on neural CBF-leading to scalability limitations and overly conservative safe sets--our approach is sample-efficient, scalable, and results in less restrictive safety regions. We validate our framework through case studies on obstacle avoidance in autonomous driving and geo-fencing of aerial vehicles, demonstrating its ability to generate larger and less conservative safe sets compared to conventional techniques.
comment: 17 Pages, 10 Figures. First two authors have contributed equally
Matrix-Valued Measures and Wishart Statistics for Target Tracking Applications
Ensuring sufficiently accurate models is crucial in target tracking systems. If the assumed models deviate too much from the truth, the tracking performance might be severely degraded. While the models are usually defined using multivariate conditions, the measures used to validate them are most often scalar-valued. In this paper, we propose matrix-valued measures for both offline and online assessment of target tracking systems. Recent results from Wishart statistics, and approximations thereof, are adapted and it is shown how these can be incorporated to infer statistical properties for the eigenvalues of the proposed measures. In addition, we relate these results to the statistics of the baseline measures. Finally, the applicability of the proposed measures are demonstrated using two important problems in target tracking: (i) distributed track fusion design; and (ii) filter model mismatch detection.
comment: 12 pages. Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems
Two-Player Zero-Sum Hybrid Games
In this paper, we formulate a two-player zero-sum game under dynamic constraints defined by hybrid dynamical equations. The game consists of a min-max problem involving a cost functional that depends on the actions and resulting solutions to the hybrid system, defined as functions of hybrid time and, hence, can flow or jump. A terminal set conveniently defined allows to recast both finite and infinite horizon problems. We present sufficient conditions given in terms of Hamilton-Jacobi-Bellman-Isaacs-like equations to guarantee to attain a solution to the game. It is shown that when the players select the optimal strategy, the value function can be evaluated without computing solutions to the hybrid system. Under additional conditions, we show that the optimal state-feedback laws render a set of interest asymptotically stable for the resulting hybrid closed-loop system. Applications of these games, presented here as robust control problems, include disturbance rejection and security problems.
comment: 21 pages
Adaptive Control of Dubins Vehicle in the Presence of Loss of Effectiveness (Extended Version)
The control of a Dubins Vehicle when subjected to a loss of control effectiveness in the turning rate is considered. A complex state-space representation is used to model the vehicle dynamics. An adaptive control design is proposed, with the underlying stability analysis guaranteeing closed-loop boundedness and tracking of a desired path. It is shown that a path constructed by waypoints and a minimum turn radius can be specified using a reference model which can be followed by the closed loop system. The control design utilizes the complex representation as well as a PID controller for the nominal closed-loop. How the design can be modified to ensure to ensure path following even in the presence input constraints is also discussed. Simulation studies are carried out to complement the theoretical derivations.
comment: Submitted to the 64th IEEE Conference on Decision and Control, 2025
System Identification Under Bounded Noise: Optimal Rates Beyond Least Squares
System identification is a fundamental problem in control and learning, particularly in high-stakes applications where data efficiency is critical. Classical approaches, such as the ordinary least squares estimator (OLS), achieve an $O(1/\sqrt{T})$ convergence rate under Gaussian noise assumptions, where $T$ is the number of samples. This rate has been shown to match the lower bound. However, in many practical scenarios, noise is known to be bounded, opening the possibility of improving sample complexity. In this work, we establish the minimax lower bound for system identification under bounded noise, proving that the $O(1/T)$ convergence rate is indeed optimal. We further demonstrate that OLS remains limited to an $\Omega(1/\sqrt{T})$ convergence rate, making it fundamentally suboptimal in the presence of bounded noise. Finally, we instantiate two natural variations of OLS that obtain the optimal sample complexity.
An Overview of Automated Vehicle Longitudinal Platoon Formation Strategies
Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communication, and control strategies has resulted in rapid and constant evolution of AV platooning strategies. In this paper, we review the state-of-the-art knowledge in AV longitudinal platoon formation using a five-component platooning framework, which includes vehicle model, information-receiving process, information flow topology, spacing policy, and controller and discuss the advantages and limitations of the components. Based on the discussion about existing strategies and associated limitations, potential future research directions are presented.
Reinforcement Learning for Control of Evolutionary and Ecological Processes
As Evolutionary Dynamics moves from the realm of theory into application, algorithms are needed to move beyond simple models. Yet few such methods exist in the literature. Ecological and physiological factors are known to be central to evolution in realistic contexts, but accounting for them generally renders problems intractable to existing methods. We introduce a formulation of evolutionary games which accounts for ecology and physiology by modeling both as computations and use this to analyze the problem of directed evolution via methods from Reinforcement Learning. This combination enables us to develop first-of-their-kind results on the algorithmic problem of learning to control an evolving population of cells. We prove a complexity bound on eco-evolutionary control in situations with limited prior knowledge of cellular physiology or ecology, give the first results on the most general version of the mathematical problem of directed evolution, and establish a new link between AI and biology.
comment: 14 pages, 10 page appendix
Robust Path Recommendations During Public Transit Disruptions Under Demand Uncertainty
When there are significant service disruptions in public transit systems, passengers usually need guidance to find alternative paths. This paper proposes a path recommendation model to mitigate congestion during public transit disruptions. Passengers with different origins, destinations, and departure times are recommended with different paths such that the system travel time is minimized. We model the path recommendation problem as an optimal flow problem with uncertain demand information. To tackle the lack of analytical formulation of travel times due to capacity constraints, we propose a simulation-based first-order approximation to transform the original problem into a linear program. Uncertainties in demand are modeled using robust optimization to protect the path recommendation strategies against inaccurate estimates. A real-world rail disruption scenario in the Chicago Transit Authority (CTA) system is used as a case study. Results show that even without considering uncertainty, the nominal model can reduce the system travel time by 9.1% (compared to the status quo), and outperforms the benchmark capacity-based path recommendation. The average travel time of passengers in the incident line (i.e., passengers receiving recommendations) is reduced more (-20.6% compared to the status quo). After incorporating the demand uncertainty, the robust model can further reduce system travel times. The best robust model can decrease the average travel time of incident-line passengers by 2.91% compared to the nominal model. The improvement of robust models is more prominent when the actual demand pattern is close to the worst-case demand.
Leveraging Reinforcement Learning and Koopman Theory for Enhanced Model Predictive Control Performance
This study presents an innovative approach to Model Predictive Control (MPC) by leveraging the powerful combination of Koopman theory and Deep Reinforcement Learning (DRL). By transforming nonlinear dynamical systems into a higher-dimensional linear regime, the Koopman operator facilitates the linear treatment of nonlinear behaviors, paving the way for more efficient control strategies. Our methodology harnesses the predictive prowess of Koopman-based models alongside the optimization capabilities of DRL, particularly using the Proximal Policy Optimization (PPO) algorithm, to enhance the controller's performance. The resulting end-to-end learning framework refines the predictive control policies to cater to specific operational tasks, optimizing both performance and economic efficiency. We validate our approach through rigorous NMPC and eNMPC case studies, demonstrating that the Koopman-RL controller outperforms traditional controllers by achieving higher stability, superior constraint satisfaction, and significant cost savings. The findings indicate that our model can be a robust tool for complex control tasks, offering valuable insights into future applications of RL in MPC.
comment: arXiv admin note: This version has been removed by arXiv administrators due to copyright infringement and inappropriate text reuse from external sources
Systems and Control (EESS)
WaLRUS: Wavelets for Long-range Representation Using SSMs
State-Space Models (SSMs) have proven to be powerful tools for modeling long-range dependencies in sequential data. While the recent method known as HiPPO has demonstrated strong performance, and formed the basis for machine learning models S4 and Mamba, it remains limited by its reliance on closed-form solutions for a few specific, well-behaved bases. The SaFARi framework generalized this approach, enabling the construction of SSMs from arbitrary frames, including non-orthogonal and redundant ones, thus allowing an infinite diversity of possible "species" within the SSM family. In this paper, we introduce WaLRUS (Wavelets for Long-range Representation Using SSMs), a new implementation of SaFARi built from Daubechies wavelets.
comment: 15 pages, 8 figures. Submitted to Neurips 2025
Optimal Satellite Maneuvers for Spaceborne Jamming Attacks
Satellites are becoming exceedingly critical for communication, making them prime targets for cyber-physical attacks. We consider a rogue satellite in low Earth orbit that jams the uplink communication between another satellite and a ground station. To achieve maximal interference with minimal fuel consumption, the jammer carefully maneuvers itself relative to the target satellite's antenna. We cast this maneuvering objective as a two-stage optimal control problem, involving i) repositioning to an efficient jamming position before uplink communication commences; and ii) maintaining an efficient jamming position after communication has started. We obtain the optimal maneuvering trajectories for the jammer and perform simulations to show how they enable the disruption of uplink communication with reasonable fuel consumption.
Particle Filtering for Enhanced Parameter Estimation in Bilinear Systems Under Colored Noise
This paper addresses the challenging problem of parameter estimation in bilinear systems under colored noise. A novel approach, termed B-PF-RLS, is proposed, combining a particle filter (PF) with a recursive least squares (RLS) estimator. The B-PF-RLS algorithm tackles the complexities arising from system nonlinearities and colored noise by effectively estimating unknown system states using the particle filter, which are then integrated into the RLS parameter estimation process. Furthermore, the paper introduces an enhanced particle filter that eliminates the need for explicit knowledge of the measurement noise variance, enhancing the method's practicality for real-world applications. Numerical examples demonstrate the B-PF-RLS algorithm's superior performance in accurately estimating both system parameters and states, even under uncertain noise conditions. This work offers a robust and effective solution for system identification in various engineering applications involving bilinear models subject to complex noise environments.
Reach-avoid games for players with damped double integrator dynamics
This paper studies a reach-avoid game of two damped double integrator players. An attacker aims to reach a static target, while a faster defender tries to protect the target by intercepting the attacker before it reaches the target. In scenarios where the defender succeeds, the defender aims to maximize the attacker's final distance from the target, while the attacker aims to minimize it. This work focuses on determining the equilibrium strategy in the defender-winning scenarios. The optimal state feedback strategy is obtained by a differential game approach combining geometric analysis. We construct a multiple reachable region to analyse the damped double integrator player's motion under optimal strategy. Building on this, a new type of the attacker's dominance region is introduced for the first time. It is shown that different strategies are required when the terminal point lies in distinct areas of the attacker's dominance region. Then, a necessary condition is derived for the proposed strategy to be optimal using differential game approach. Furthermore, a case where both players start at rest is discussed, and some useful properties about the dominance region and the optimal strategy are presented. Simulations are conducted to show the effectiveness of the proposed strategy.
PyScrew: A Comprehensive Dataset Collection from Industrial Screw Driving Experiments
This paper presents a comprehensive collection of industrial screw driving datasets designed to advance research in manufacturing process monitoring and quality control. The collection comprises six distinct datasets with over 34,000 individual screw driving operations conducted under controlled experimental conditions, capturing the multifaceted nature of screw driving processes in plastic components. Each dataset systematically investigates specific aspects: natural thread degradation patterns through repeated use (s01), variations in surface friction conditions including contamination and surface treatments (s02), diverse assembly faults with up to 27 error types (s03-s04), and fabrication parameter variations in both upper and lower workpieces through modified injection molding settings (s05-s06). We detail the standardized experimental setup used across all datasets, including hardware specifications, process phases, and data acquisition methods. The hierarchical data model preserves the temporal and operational structure of screw driving processes, facilitating both exploratory analysis and the development of machine learning models. To maximize accessibility, we provide dual access pathways: raw data through Zenodo with a persistent DOI, and a purpose-built Python library (PyScrew) that offers consistent interfaces for data loading, preprocessing, and integration with common analysis workflows. These datasets serve diverse research applications including anomaly detection, predictive maintenance, quality control system development, feature extraction methodology evaluation, and classification of specific error conditions. By addressing the scarcity of standardized, comprehensive datasets in industrial manufacturing, this collection enables reproducible research and fair comparison of analytical approaches in an area of growing importance for industrial automation.
comment: 18 pages, 2 figures, 7 tables
Model-free Dynamic Mode Adaptive Control using Matrix RLS
This paper presents a novel, model-free, data-driven control synthesis technique known as dynamic mode adaptive control (DMAC) for synthesizing controllers for complex systems whose mathematical models are not suitable for classical control design. DMAC consists of a dynamics approximation module and a controller module. The dynamics approximation module is motivated by data-driven reduced-order modeling techniques and directly approximates the system's dynamics in state-space form using a matrix version of the recursive least squares algorithm. The controller module includes an output tracking controller that utilizes sparse measurements from the system to generate the control signal. The DMAC controller design technique is demonstrated through various dynamic systems commonly found in engineering applications. A systematic sensitivity study demonstrates the robustness of DMAC with respect to its own hyperparameters and the system's parameters.
Robustness of Incentive Mechanisms Against System Misspecification in Congestion Games
To steer the behavior of selfish, resource-sharing agents in a socio-technical system towards the direction of higher efficiency, the system designer requires accurate models of both agent behaviors and the underlying system infrastructure. For instance, traffic controllers often use road latency models to design tolls whose deployment can effectively mitigate traffic congestion. However, misspecifications of system parameters may restrict a system designer's ability to influence collective agent behavior toward efficient outcomes. In this work, we study the impact of system misspecifications on toll design for atomic congestion games. We prove that tolls designed under sufficiently minor system misspecifications, when deployed, do not introduce new Nash equilibria in atomic congestion games compared to tolls designed in the noise-free setting, implying a form of local robustness. We then upper bound the degree to which the worst-case equilibrium system performance could decrease when tolls designed under a given level of system misspecification are deployed. We validate our theoretical results via Monte-Carlo simulations as well as realizations of our worst-case guarantees.
comment: 6 pages, 3 figures
Stability and Convergence Analysis of Multi-Agent Consensus with Communication Delays: A Lambert W Function Approach
This paper investigates the effect of constant time delay in weakly connected multi-agent systems modeled by double integrator dynamics. A novel analytical approach is proposed to establish an upper bound on the permissible time delay that ensures stability and consensus convergence. The analysis employs the Lambert W function method in higher-dimensional systems to derive explicit conditions under which consensus is achieved. The theoretical results are rigorously proven and provide insight into the allowable delay margins. The analysis applies to general leaderless undirected network topologies. The framework also accounts for complex and realistic delays, including non-commensurate communication delays. Numerical examples are provided to demonstrate the effectiveness of the proposed method.
comment: short version submitted to IEEE L-CSS and CDC 2025
Leveraging Reinforcement Learning and Koopman Theory for Enhanced Model Predictive Control Performance
This study presents an innovative approach to Model Predictive Control (MPC) by leveraging the powerful combination of Koopman theory and Deep Reinforcement Learning (DRL). By transforming nonlinear dynamical systems into a higher-dimensional linear regime, the Koopman operator facilitates the linear treatment of nonlinear behaviors, paving the way for more efficient control strategies. Our methodology harnesses the predictive prowess of Koopman-based models alongside the optimization capabilities of DRL, particularly using the Proximal Policy Optimization (PPO) algorithm, to enhance the controller's performance. The resulting end-to-end learning framework refines the predictive control policies to cater to specific operational tasks, optimizing both performance and economic efficiency. We validate our approach through rigorous NMPC and eNMPC case studies, demonstrating that the Koopman-RL controller outperforms traditional controllers by achieving higher stability, superior constraint satisfaction, and significant cost savings. The findings indicate that our model can be a robust tool for complex control tasks, offering valuable insights into future applications of RL in MPC.
comment: 21 pages, 8 figures, 2 author photos. Affiliations: Electrical Engineering Department, Penn State University. Keywords: Economic Nonlinear Model Predictive Control, (e)NMPC, Reinforcement Learning, Surrogate Models, System Identification
On the Surprising Effectiveness of Spectrum Clipping in Learning Stable Linear Dynamics
When learning stable linear dynamical systems from data, three important properties are desirable: i) predictive accuracy, ii) provable stability, and iii) computational efficiency. Unconstrained minimization of reconstruction errors leads to high accuracy and efficiency but cannot guarantee stability. Existing methods to enforce stability often preserve accuracy, but do so only at the cost of increased computation. In this work, we investigate if a straightforward approach can simultaneously offer all three desiderata of learning stable linear systems. Specifically, we consider a post-hoc approach that manipulates the spectrum of the learned system matrix that was computed using unconstrained least squares. We call this approach spectrum clipping (SC) as it involves eigen decomposition and subsequent reconstruction of the system matrix after clipping any eigenvalues that are larger than one to one (without altering the eigenvectors). Through comprehensive experiments involving two different applications and publicly available benchmark datasets, we show that this simple technique can efficiently learn highly-accurate linear systems that are provably-stable. Notably, we find that SC can match or outperform strong baselines while being orders-of-magnitude faster. We also show that SC can be readily combined with Koopman operators to learn stable nonlinear dynamics, such as those underlying complex dexterous manipulation skills involving multi-fingered robotic hands. Finally, we find that SC can learn stable robot policies even when the training data includes unsuccessful or truncated demonstrations. Our codes and dataset can be found at https://github.com/GT-STAR-Lab/spec_clip.
Adaptive Reinforcement Learning for State Avoidance in Discrete Event Systems
Reinforcement learning (RL) has emerged as a potent paradigm for autonomous decision-making in complex environments. However, the integration of event-driven decision processes within RL remains a challenge. This paper presents a novel architecture that combines a Discrete Event Supervisory (DES) model with a standard RL framework to create a hybrid decision-making system. Our model leverages the DES's capabilities in managing event-based dynamics with the RL agent's adaptability to continuous states and actions, facilitating a more robust and flexible control strategy in systems characterized by both continuous and discrete events. The DES model operates alongside the RL agent, enhancing the policy's performance with event-based insights, while the environment's state transitions are governed by a mechanistic model. We demonstrate the efficacy of our approach through simulations that show improved performance metrics over traditional RL implementations. Our results suggest that this integrated approach holds promise for applications ranging from industrial automation to intelligent traffic systems, where discrete event handling is paramount.
comment: 8 pages, 5 figures. Preprint version
(Corrected Version) Push-LSVRG-UP: Distributed Stochastic Optimization over Unbalanced Directed Networks with Uncoordinated Triggered Probabilities
Distributed stochastic optimization, arising in the crossing and integration of traditional stochastic optimization, distributed computing and storage, and network science, has advantages of high efficiency and a low per-iteration computational complexity in resolving large-scale optimization problems. This paper concentrates on resolving a large-scale convex finite-sum optimization problem in a multi-agent system over unbalanced directed networks. To tackle this problem in an efficient way, a distributed consensus optimization algorithm, adopting the push-sum technique and a distributed loopless stochastic variance-reduced gradient (LSVRG) method with uncoordinated triggered probabilities, is developed and named Push-LSVRG-UP. Each agent under this algorithmic framework performs only local computation and communicates only with its neighbors without leaking their private information. The convergence analysis of Push-LSVRG-UP is relied on analyzing the contraction relationships between four error terms associated with the multi-agent system. Theoretical results provide an explicit feasible range of the constant step-size, a linear convergence rate, and an iteration complexity of Push-LSVRG-UP when achieving the globally optimal solution. It is shown that Push-LSVRG-UP achieves the superior characteristics of accelerated linear convergence, fewer storage costs, and a lower per-iteration computational complexity than most existing works. Meanwhile, the introduction of an uncoordinated probabilistic triggered mechanism allows Push-LSVRG-UP to facilitate the independence and flexibility of agents in computing local batch gradients. In simulations, the practicability and improved performance of Push-LSVRG-UP are manifested via resolving two distributed learning problems based on real-world datasets.
comment: 16 pages, 30 figures
CP-NCBF: A Conformal Prediction-based Approach to Synthesize Verified Neural Control Barrier Functions
Control Barrier Functions (CBFs) are a practical approach for designing safety-critical controllers, but constructing them for arbitrary nonlinear dynamical systems remains a challenge. Recent efforts have explored learning-based methods, such as neural CBFs (NCBFs), to address this issue. However, ensuring the validity of NCBFs is difficult due to potential learning errors. In this letter, we propose a novel framework that leverages split-conformal prediction to generate formally verified neural CBFs with probabilistic guarantees based on a user-defined error rate, referred to as CP-NCBF. Unlike existing methods that impose Lipschitz constraints on neural CBF-leading to scalability limitations and overly conservative safe sets--our approach is sample-efficient, scalable, and results in less restrictive safety regions. We validate our framework through case studies on obstacle avoidance in autonomous driving and geo-fencing of aerial vehicles, demonstrating its ability to generate larger and less conservative safe sets compared to conventional techniques.
comment: 17 Pages, 10 Figures. First two authors have contributed equally
Matrix-Valued Measures and Wishart Statistics for Target Tracking Applications
Ensuring sufficiently accurate models is crucial in target tracking systems. If the assumed models deviate too much from the truth, the tracking performance might be severely degraded. While the models are usually defined using multivariate conditions, the measures used to validate them are most often scalar-valued. In this paper, we propose matrix-valued measures for both offline and online assessment of target tracking systems. Recent results from Wishart statistics, and approximations thereof, are adapted and it is shown how these can be incorporated to infer statistical properties for the eigenvalues of the proposed measures. In addition, we relate these results to the statistics of the baseline measures. Finally, the applicability of the proposed measures are demonstrated using two important problems in target tracking: (i) distributed track fusion design; and (ii) filter model mismatch detection.
comment: 12 pages. Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems
Two-Player Zero-Sum Hybrid Games
In this paper, we formulate a two-player zero-sum game under dynamic constraints defined by hybrid dynamical equations. The game consists of a min-max problem involving a cost functional that depends on the actions and resulting solutions to the hybrid system, defined as functions of hybrid time and, hence, can flow or jump. A terminal set conveniently defined allows to recast both finite and infinite horizon problems. We present sufficient conditions given in terms of Hamilton-Jacobi-Bellman-Isaacs-like equations to guarantee to attain a solution to the game. It is shown that when the players select the optimal strategy, the value function can be evaluated without computing solutions to the hybrid system. Under additional conditions, we show that the optimal state-feedback laws render a set of interest asymptotically stable for the resulting hybrid closed-loop system. Applications of these games, presented here as robust control problems, include disturbance rejection and security problems.
comment: 21 pages
Adaptive Control of Dubins Vehicle in the Presence of Loss of Effectiveness (Extended Version)
The control of a Dubins Vehicle when subjected to a loss of control effectiveness in the turning rate is considered. A complex state-space representation is used to model the vehicle dynamics. An adaptive control design is proposed, with the underlying stability analysis guaranteeing closed-loop boundedness and tracking of a desired path. It is shown that a path constructed by waypoints and a minimum turn radius can be specified using a reference model which can be followed by the closed loop system. The control design utilizes the complex representation as well as a PID controller for the nominal closed-loop. How the design can be modified to ensure to ensure path following even in the presence input constraints is also discussed. Simulation studies are carried out to complement the theoretical derivations.
comment: Submitted to the 64th IEEE Conference on Decision and Control, 2025
System Identification Under Bounded Noise: Optimal Rates Beyond Least Squares
System identification is a fundamental problem in control and learning, particularly in high-stakes applications where data efficiency is critical. Classical approaches, such as the ordinary least squares estimator (OLS), achieve an $O(1/\sqrt{T})$ convergence rate under Gaussian noise assumptions, where $T$ is the number of samples. This rate has been shown to match the lower bound. However, in many practical scenarios, noise is known to be bounded, opening the possibility of improving sample complexity. In this work, we establish the minimax lower bound for system identification under bounded noise, proving that the $O(1/T)$ convergence rate is indeed optimal. We further demonstrate that OLS remains limited to an $\Omega(1/\sqrt{T})$ convergence rate, making it fundamentally suboptimal in the presence of bounded noise. Finally, we instantiate two natural variations of OLS that obtain the optimal sample complexity.
An Overview of Automated Vehicle Longitudinal Platoon Formation Strategies
Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communication, and control strategies has resulted in rapid and constant evolution of AV platooning strategies. In this paper, we review the state-of-the-art knowledge in AV longitudinal platoon formation using a five-component platooning framework, which includes vehicle model, information-receiving process, information flow topology, spacing policy, and controller and discuss the advantages and limitations of the components. Based on the discussion about existing strategies and associated limitations, potential future research directions are presented.
Reinforcement Learning for Control of Evolutionary and Ecological Processes
As Evolutionary Dynamics moves from the realm of theory into application, algorithms are needed to move beyond simple models. Yet few such methods exist in the literature. Ecological and physiological factors are known to be central to evolution in realistic contexts, but accounting for them generally renders problems intractable to existing methods. We introduce a formulation of evolutionary games which accounts for ecology and physiology by modeling both as computations and use this to analyze the problem of directed evolution via methods from Reinforcement Learning. This combination enables us to develop first-of-their-kind results on the algorithmic problem of learning to control an evolving population of cells. We prove a complexity bound on eco-evolutionary control in situations with limited prior knowledge of cellular physiology or ecology, give the first results on the most general version of the mathematical problem of directed evolution, and establish a new link between AI and biology.
comment: 14 pages, 10 page appendix
Robust Path Recommendations During Public Transit Disruptions Under Demand Uncertainty
When there are significant service disruptions in public transit systems, passengers usually need guidance to find alternative paths. This paper proposes a path recommendation model to mitigate congestion during public transit disruptions. Passengers with different origins, destinations, and departure times are recommended with different paths such that the system travel time is minimized. We model the path recommendation problem as an optimal flow problem with uncertain demand information. To tackle the lack of analytical formulation of travel times due to capacity constraints, we propose a simulation-based first-order approximation to transform the original problem into a linear program. Uncertainties in demand are modeled using robust optimization to protect the path recommendation strategies against inaccurate estimates. A real-world rail disruption scenario in the Chicago Transit Authority (CTA) system is used as a case study. Results show that even without considering uncertainty, the nominal model can reduce the system travel time by 9.1% (compared to the status quo), and outperforms the benchmark capacity-based path recommendation. The average travel time of passengers in the incident line (i.e., passengers receiving recommendations) is reduced more (-20.6% compared to the status quo). After incorporating the demand uncertainty, the robust model can further reduce system travel times. The best robust model can decrease the average travel time of incident-line passengers by 2.91% compared to the nominal model. The improvement of robust models is more prominent when the actual demand pattern is close to the worst-case demand.
Leveraging Reinforcement Learning and Koopman Theory for Enhanced Model Predictive Control Performance
This study presents an innovative approach to Model Predictive Control (MPC) by leveraging the powerful combination of Koopman theory and Deep Reinforcement Learning (DRL). By transforming nonlinear dynamical systems into a higher-dimensional linear regime, the Koopman operator facilitates the linear treatment of nonlinear behaviors, paving the way for more efficient control strategies. Our methodology harnesses the predictive prowess of Koopman-based models alongside the optimization capabilities of DRL, particularly using the Proximal Policy Optimization (PPO) algorithm, to enhance the controller's performance. The resulting end-to-end learning framework refines the predictive control policies to cater to specific operational tasks, optimizing both performance and economic efficiency. We validate our approach through rigorous NMPC and eNMPC case studies, demonstrating that the Koopman-RL controller outperforms traditional controllers by achieving higher stability, superior constraint satisfaction, and significant cost savings. The findings indicate that our model can be a robust tool for complex control tasks, offering valuable insights into future applications of RL in MPC.
comment: arXiv admin note: This version has been removed by arXiv administrators due to copyright infringement and inappropriate text reuse from external sources
Robotics
Bracing for Impact: Robust Humanoid Push Recovery and Locomotion with Reduced Order Models
Push recovery during locomotion will facilitate the deployment of humanoid robots in human-centered environments. In this paper, we present a unified framework for walking control and push recovery for humanoid robots, leveraging the arms for push recovery while dynamically walking. The key innovation is to use the environment, such as walls, to facilitate push recovery by combining Single Rigid Body model predictive control (SRB-MPC) with Hybrid Linear Inverted Pendulum (HLIP) dynamics to enable robust locomotion, push detection, and recovery by utilizing the robot's arms to brace against such walls and dynamically adjusting the desired contact forces and stepping patterns. Extensive simulation results on a humanoid robot demonstrate improved perturbation rejection and tracking performance compared to HLIP alone, with the robot able to recover from pushes up to 100N for 0.2s while walking at commanded speeds up to 0.5m/s. Robustness is further validated in scenarios with angled walls and multi-directional pushes.
SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics
Robot learning has produced remarkably effective ``black-box'' controllers for complex tasks such as dynamic locomotion on humanoids. Yet ensuring dynamic safety, i.e., constraint satisfaction, remains challenging for such policies. Reinforcement learning (RL) embeds constraints heuristically through reward engineering, and adding or modifying constraints requires retraining. Model-based approaches, like control barrier functions (CBFs), enable runtime constraint specification with formal guarantees but require accurate dynamics models. This paper presents SHIELD, a layered safety framework that bridges this gap by: (1) training a generative, stochastic dynamics residual model using real-world data from hardware rollouts of the nominal controller, capturing system behavior and uncertainties; and (2) adding a safety layer on top of the nominal (learned locomotion) controller that leverages this model via a stochastic discrete-time CBF formulation enforcing safety constraints in probability. The result is a minimally-invasive safety layer that can be added to the existing autonomy stack to give probabilistic guarantees of safety that balance risk and performance. In hardware experiments on an Unitree G1 humanoid, SHIELD enables safe navigation (obstacle avoidance) through varied indoor and outdoor environments using a nominal (unknown) RL controller and onboard perception.
comment: Video at https://vimeo.com/1061676063
UMArm: Untethered, Modular, Wearable, Soft Pneumatic Arm
Robotic arms are essential to modern industries, however, their adaptability to unstructured environments remains limited. Soft robotic arms, particularly those actuated pneumatically, offer greater adaptability in unstructured environments and enhanced safety for human-robot interaction. However, current pneumatic soft arms are constrained by limited degrees of freedom, precision, payload capacity, and reliance on bulky external pressure regulators. In this work, a novel pneumatically driven rigid-soft hybrid arm, ``UMArm'', is presented. The shortcomings of pneumatically actuated soft arms are addressed by densely integrating high-force-to-weight-ratio, self-regulated McKibben actuators onto a lightweight rigid spine structure. The modified McKibben actuators incorporate valves and controllers directly inside, eliminating the need for individual pressure lines and external regulators, significantly reducing system weight and complexity. Full untethered operation, high payload capacity, precision, and directionally tunable compliance are achieved by the UMArm. Portability is demonstrated through a wearable assistive arm experiment, and versatility is showcased by reconfiguring the system into an inchworm robot. The results of this work show that the high-degree-of-freedom, external-regulator-free pneumatically driven arm systems like the UMArm possess great potential for real-world unstructured environments.
REACT: Runtime-Enabled Active Collision-avoidance Technique for Autonomous Driving
Achieving rapid and effective active collision avoidance in dynamic interactive traffic remains a core challenge for autonomous driving. This paper proposes REACT (Runtime-Enabled Active Collision-avoidance Technique), a closed-loop framework that integrates risk assessment with active avoidance control. By leveraging energy transfer principles and human-vehicle-road interaction modeling, REACT dynamically quantifies runtime risk and constructs a continuous spatial risk field. The system incorporates physically grounded safety constraints such as directional risk and traffic rules to identify high-risk zones and generate feasible, interpretable avoidance behaviors. A hierarchical warning trigger strategy and lightweight system design enhance runtime efficiency while ensuring real-time responsiveness. Evaluations across four representative high-risk scenarios including car-following braking, cut-in, rear-approaching, and intersection conflict demonstrate REACT's capability to accurately identify critical risks and execute proactive avoidance. Its risk estimation aligns closely with human driver cognition (i.e., warning lead time < 0.4 s), achieving 100% safe avoidance with zero false alarms or missed detections. Furthermore, it exhibits superior real-time performance (< 50 ms latency), strong foresight, and generalization. The lightweight architecture achieves state-of-the-art accuracy, highlighting its potential for real-time deployment in safety-critical autonomous systems.
comment: 22 pages, 11 figures
Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views
Vision based robot manipulation uses cameras to capture one or more images of a scene containing the objects to be manipulated. Taking multiple images can help if any object is occluded from one viewpoint but more visible from another viewpoint. However, the camera has to be moved to a sequence of suitable positions for capturing multiple images, which requires time and may not always be possible, due to reachability constraints. So while additional images can produce more accurate grasp poses due to the extra information available, the time-cost goes up with the number of additional views sampled. Scene representations like Gaussian Splatting are capable of rendering accurate photorealistic virtual images from user-specified novel viewpoints. In this work, we show initial results which indicate that novel view synthesis can provide additional context in generating grasp poses. Our experiments on the Graspnet-1billion dataset show that novel views contributed force-closure grasps in addition to the force-closure grasps obtained from sparsely sampled real views while also improving grasp coverage. In the future we hope this work can be extended to improve grasp extraction from radiance fields constructed with a single input image, using for example diffusion models or generalizable radiance fields.
comment: 6 pages
SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision ICRA
Accurate pose estimation of surgical tools in Robot-assisted Minimally Invasive Surgery (RMIS) is essential for surgical navigation and robot control. While traditional marker-based methods offer accuracy, they face challenges with occlusions, reflections, and tool-specific designs. Similarly, supervised learning methods require extensive training on annotated datasets, limiting their adaptability to new tools. Despite their success in other domains, zero-shot pose estimation models remain unexplored in RMIS for pose estimation of surgical instruments, creating a gap in generalising to unseen surgical tools. This paper presents a novel 6 Degrees of Freedom (DoF) pose estimation pipeline for surgical instruments, leveraging state-of-the-art zero-shot RGB-D models like the FoundationPose and SAM-6D. We advanced these models by incorporating vision-based depth estimation using the RAFT-Stereo method, for robust depth estimation in reflective and textureless environments. Additionally, we enhanced SAM-6D by replacing its instance segmentation module, Segment Anything Model (SAM), with a fine-tuned Mask R-CNN, significantly boosting segmentation accuracy in occluded and complex conditions. Extensive validation reveals that our enhanced SAM-6D surpasses FoundationPose in zero-shot pose estimation of unseen surgical instruments, setting a new benchmark for zero-shot RGB-D pose estimation in RMIS. This work enhances the generalisability of pose estimation for unseen objects and pioneers the application of RGB-D zero-shot methods in RMIS.
comment: To be published in 2025 International Conference on Robotics and Automation (ICRA)
Self-supervised perception for tactile skin covered dexterous hands
We present Sparsh-skin, a pre-trained encoder for magnetic skin sensors distributed across the fingertips, phalanges, and palm of a dexterous robot hand. Magnetic tactile skins offer a flexible form factor for hand-wide coverage with fast response times, in contrast to vision-based tactile sensors that are restricted to the fingertips and limited by bandwidth. Full hand tactile perception is crucial for robot dexterity. However, a lack of general-purpose models, challenges with interpreting magnetic flux and calibration have limited the adoption of these sensors. Sparsh-skin, given a history of kinematic and tactile sensing across a hand, outputs a latent tactile embedding that can be used in any downstream task. The encoder is self-supervised via self-distillation on a variety of unlabeled hand-object interactions using an Allegro hand sensorized with Xela uSkin. In experiments across several benchmark tasks, from state estimation to policy learning, we find that pretrained Sparsh-skin representations are both sample efficient in learning downstream tasks and improve task performance by over 41% compared to prior work and over 56% compared to end-to-end learning.
comment: 18 pages, 15 figures
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) is a core task where embodied agents leverage their spatial mobility to navigate in 3D environments toward designated destinations based on natural language instructions. Recently, video-language large models (Video-VLMs) with strong generalization capabilities and rich commonsense knowledge have shown remarkable performance when applied to VLN tasks. However, these models still encounter the following challenges when applied to real-world 3D navigation: 1) Insufficient understanding of 3D geometry and spatial semantics; 2) Limited capacity for large-scale exploration and long-term environmental memory; 3) Poor adaptability to dynamic and changing environments.To address these limitations, we propose Dynam3D, a dynamic layered 3D representation model that leverages language-aligned, generalizable, and hierarchical 3D representations as visual input to train 3D-VLM in navigation action prediction. Given posed RGB-D images, our Dynam3D projects 2D CLIP features into 3D space and constructs multi-level 3D patch-instance-zone representations for 3D geometric and semantic understanding with a dynamic and layer-wise update strategy. Our Dynam3D is capable of online encoding and localization of 3D instances, and dynamically updates them in changing environments to provide large-scale exploration and long-term memory capabilities for navigation. By leveraging large-scale 3D-language pretraining and task-specific adaptation, our Dynam3D sets new state-of-the-art performance on VLN benchmarks including R2R-CE, REVERIE-CE and NavRAG-CE under monocular settings. Furthermore, experiments for pre-exploration, lifelong memory, and real-world robot validate the effectiveness of practical deployment.
Decoupling Collision Avoidance in and for Optimal Control using Least-Squares Support Vector Machines
This paper details an approach to linearise differentiable but non-convex collision avoidance constraints tailored to convex shapes. It revisits introducing differential collision avoidance constraints for convex objects into an optimal control problem (OCP) using the separating hyperplane theorem. By framing this theorem as a classification problem, the hyperplanes are eliminated as optimisation variables from the OCP. This effectively transforms non-convex constraints into linear constraints. A bi-level algorithm computes the hyperplanes between the iterations of an optimisation solver and subsequently embeds them as parameters into the OCP. Experiments demonstrate the approach's favourable scalability towards cluttered environments and its applicability to various motion planning approaches. It decreases trajectory computation times between 50\% and 90\% compared to a state-of-the-art approach that directly includes the hyperplanes as variables in the optimal control problem.
Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients. However, they face significant challenges, including public acceptance, limited longevity, and barriers to commercialization. Meanwhile, noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust high-dimensional control for dexterous tasks. To address these issues, this study introduces a novel human-centered multimodal AI approach as intelligent compensatory mechanisms for lost motor functions that could potentially enable patients with severe paralysis to control high-dimensional assistive devices, such as dexterous robotic arms, using limited and noninvasive inputs. In contrast to the current state-of-the-art (SoTA) noninvasive approaches, our context-aware, multimodal shared-autonomy framework integrates deep reinforcement learning algorithms to blend limited low-dimensional user input with real-time environmental perception, enabling adaptive, dynamic, and intelligent interpretation of human intent for complex dexterous manipulation tasks, such as pick-and-place. The results from our ARAS (Adaptive Reinforcement learning for Amplification of limited inputs in Shared autonomy) trained with synthetic users over 50,000 computer simulation episodes demonstrated the first successful implementation of the proposed closed-loop human-in-the-loop paradigm, outperforming the SoTA shared autonomy algorithms. Following a zero-shot sim-to-real transfer, ARAS was evaluated on 23 human subjects, demonstrating high accuracy in dynamic intent detection and smooth, stable 3D trajectory control for dexterous pick-and-place tasks. ARAS user study achieved a high task success rate of 92.88%, with short completion times comparable to those of SoTA invasive assistive technologies.
Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild
To perform autonomous visual search for environmental monitoring, a robot may leverage satellite imagery as a prior map. This can help inform coarse, high-level search and exploration strategies, even when such images lack sufficient resolution to allow fine-grained, explicit visual recognition of targets. However, there are some challenges to overcome with using satellite images to direct visual search. For one, targets that are unseen in satellite images are underrepresented (compared to ground images) in most existing datasets, and thus vision models trained on these datasets fail to reason effectively based on indirect visual cues. Furthermore, approaches which leverage large Vision Language Models (VLMs) for generalization may yield inaccurate outputs due to hallucination, leading to inefficient search. To address these challenges, we introduce Search-TTA, a multimodal test-time adaptation framework that can accept text and/or image input. First, we pretrain a remote sensing image encoder to align with CLIP's visual encoder to output probability distributions of target presence used for visual search. Second, our framework dynamically refines CLIP's predictions during search using a test-time adaptation mechanism. Through a feedback loop inspired by Spatial Poisson Point Processes, gradient updates (weighted by uncertainty) are used to correct (potentially inaccurate) predictions and improve search performance. To validate Search-TTA's performance, we curate a visual search dataset based on internet-scale ecological data. We find that Search-TTA improves planner performance by up to 9.7%, particularly in cases with poor initial CLIP predictions. It also achieves comparable performance to state-of-the-art VLMs. Finally, we deploy Search-TTA on a real UAV via hardware-in-the-loop testing, by simulating its operation within a large-scale simulation that provides onboard sensing.
LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios
Ensuring the safety and robustness of autonomous driving systems necessitates a comprehensive evaluation in safety-critical scenarios. However, these safety-critical scenarios are rare and difficult to collect from real-world driving data, posing significant challenges to effectively assessing the performance of autonomous vehicles. Typical existing methods often suffer from limited controllability and lack user-friendliness, as extensive expert knowledge is essentially required. To address these challenges, we propose LD-Scene, a novel framework that integrates Large Language Models (LLMs) with Latent Diffusion Models (LDMs) for user-controllable adversarial scenario generation through natural language. Our approach comprises an LDM that captures realistic driving trajectory distributions and an LLM-based guidance module that translates user queries into adversarial loss functions, facilitating the generation of scenarios aligned with user queries. The guidance module integrates an LLM-based Chain-of-Thought (CoT) code generator and an LLM-based code debugger, enhancing the controllability and robustness in generating guidance functions. Extensive experiments conducted on the nuScenes dataset demonstrate that LD-Scene achieves state-of-the-art performance in generating realistic, diverse, and effective adversarial scenarios. Furthermore, our framework provides fine-grained control over adversarial behaviors, thereby facilitating more effective testing tailored to specific driving scenarios.
comment: 13 pages, 5 figures
Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
Vision-Language-Action (VLA) models have recently become highly prominent in the field of robotics. Leveraging vision-language foundation models trained on large-scale internet data, the VLA model can generate robotic actions directly from visual observations and human instructions through a single end-to-end neural network. Despite their effectiveness, current VLA models usually accept only one form of human prompting, language instructions, which may constrain their applicability in open-ended human-robot interactions. For example, a user might expect the robot to retrieve an object shown in an image, follow an instruction written on the whiteboard, or imitate a behavior demonstrated in a video, rather than relying solely on language-based descriptions. To address this gap, we introduce OE-VLA, which explores the potential of VLA models for open-ended multimodal instructions. Extensive results demonstrate that our OE-VLA not only achieves comparable performance to traditional VLA models with linguistic input but also delivers impressive results across four additional categories of open-ended tasks. The proposed methodology could significantly expand the applications of VLA models across various everyday scenarios and facilitate human-robot interaction.
Multi-Modal Multi-Task (M3T) Federated Foundation Models for Embodied AI: Potentials and Challenges for Edge Integration
As embodied AI systems become increasingly multi-modal, personalized, and interactive, they must learn effectively from diverse sensory inputs, adapt continually to user preferences, and operate safely under resource and privacy constraints. These challenges expose a pressing need for machine learning models capable of swift, context-aware adaptation while balancing model generalization and personalization. Here, two methods emerge as suitable candidates, each offering parts of these capabilities: Foundation Models (FMs) provide a pathway toward generalization across tasks and modalities, whereas Federated Learning (FL) offers the infrastructure for distributed, privacy-preserving model updates and user-level model personalization. However, when used in isolation, each of these approaches falls short of meeting the complex and diverse capability requirements of real-world embodied environments. In this vision paper, we introduce Federated Foundation Models (FFMs) for embodied AI, a new paradigm that unifies the strengths of multi-modal multi-task (M3T) FMs with the privacy-preserving distributed nature of FL, enabling intelligent systems at the wireless edge. We collect critical deployment dimensions of FFMs in embodied AI ecosystems under a unified framework, which we name "EMBODY": Embodiment heterogeneity, Modality richness and imbalance, Bandwidth and compute constraints, On-device continual learning, Distributed control and autonomy, and Yielding safety, privacy, and personalization. For each, we identify concrete challenges and envision actionable research directions. We also present an evaluation framework for deploying FFMs in embodied AI systems, along with the associated trade-offs.
comment: 10 pages, 3 figures, 3 tables
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Generative skill acquisition enables embodied agents to actively learn a scalable and evolving repertoire of control skills, crucial for the advancement of large decision models. While prior approaches often rely on supervision signals from generalist agents (e.g., LLMs), their effectiveness in complex 3D environments remains unclear; exhaustive evaluation incurs substantial computational costs, significantly hindering the efficiency of skill learning. Inspired by recent successes in verification models for mathematical reasoning, we propose VERGSA (Verifying Embodied Reasoning in Generative Skill Acquisition), a framework that systematically integrates real-time verification principles into embodied skill learning. VERGSA establishes 1) a seamless extension from verification of mathematical reasoning into embodied learning by dynamically incorporating contextually relevant tasks into prompts and defining success metrics for both subtasks and overall tasks, and 2) an automated, scalable reward labeling scheme that synthesizes dense reward signals by iteratively finalizing the contribution of scene configuration and subtask learning to overall skill acquisition. To the best of our knowledge, this approach constitutes the first comprehensive training dataset for verification-driven generative skill acquisition, eliminating arduous manual reward engineering. Experiments validate the efficacy of our approach: 1) the exemplar task pool improves the average task success rates by 21%, 2) our verification model boosts success rates by 24% for novel tasks and 36% for encountered tasks, and 3) outperforms LLM-as-a-Judge baselines in verification quality.
Parkour in the Wild: Learning a General and Extensible Agile Locomotion Policy Using Multi-expert Distillation and RL Fine-tuning
Legged robots are well-suited for navigating terrains inaccessible to wheeled robots, making them ideal for applications in search and rescue or space exploration. However, current control methods often struggle to generalize across diverse, unstructured environments. This paper introduces a novel framework for agile locomotion of legged robots by combining multi-expert distillation with reinforcement learning (RL) fine-tuning to achieve robust generalization. Initially, terrain-specific expert policies are trained to develop specialized locomotion skills. These policies are then distilled into a unified foundation policy via the DAgger algorithm. The distilled policy is subsequently fine-tuned using RL on a broader terrain set, including real-world 3D scans. The framework allows further adaptation to new terrains through repeated fine-tuning. The proposed policy leverages depth images as exteroceptive inputs, enabling robust navigation across diverse, unstructured terrains. Experimental results demonstrate significant performance improvements over existing methods in synthesizing multi-terrain skills into a single controller. Deployment on the ANYmal D robot validates the policy's ability to navigate complex environments with agility and robustness, setting a new benchmark for legged robot locomotion.
X2C: A Dataset Featuring Nuanced Facial Expressions for Realistic Humanoid Imitation
The ability to imitate realistic facial expressions is essential for humanoid robots engaged in affective human-robot communication. However, the lack of datasets containing diverse humanoid facial expressions with proper annotations hinders progress in realistic humanoid facial expression imitation. To address these challenges, we introduce X2C (Anything to Control), a dataset featuring nuanced facial expressions for realistic humanoid imitation. With X2C, we contribute: 1) a high-quality, high-diversity, large-scale dataset comprising 100,000 (image, control value) pairs. Each image depicts a humanoid robot displaying a diverse range of facial expressions, annotated with 30 control values representing the ground-truth expression configuration; 2) X2CNet, a novel human-to-humanoid facial expression imitation framework that learns the correspondence between nuanced humanoid expressions and their underlying control values from X2C. It enables facial expression imitation in the wild for different human performers, providing a baseline for the imitation task, showcasing the potential value of our dataset; 3) real-world demonstrations on a physical humanoid robot, highlighting its capability to advance realistic humanoid facial expression imitation. Code and Data: https://lipzh5.github.io/X2CNet/
Open-Source Multi-Viewpoint Surgical Telerobotics ICRA
As robots for minimally invasive surgery (MIS) gradually become more accessible and modular, we believe there is a great opportunity to rethink and expand the visualization and control paradigms that have characterized surgical teleoperation since its inception. We conjecture that introducing one or more additional adjustable viewpoints in the abdominal cavity would not only unlock novel visualization and collaboration strategies for surgeons but also substantially boost the robustness of machine perception toward shared autonomy. Immediate advantages include controlling a second viewpoint and teleoperating surgical tools from a different perspective, which would allow collaborating surgeons to adjust their views independently and still maneuver their robotic instruments intuitively. Furthermore, we believe that capturing synchronized multi-view 3D measurements of the patient's anatomy would unlock advanced scene representations. Accurate real-time intraoperative 3D perception will allow algorithmic assistants to directly control one or more robotic instruments and/or robotic cameras. Toward these goals, we are building a synchronized multi-viewpoint, multi-sensor robotic surgery system by integrating high-performance vision components and upgrading the da Vinci Research Kit control logic. This short paper reports a functional summary of our setup and elaborates on its potential impacts in research and future clinical practice. By fully open-sourcing our system, we will enable the research community to reproduce our setup, improve it, and develop powerful algorithms, effectively boosting clinical translation of cutting-edge research.
comment: 2 pages, 2 figures, ICRA-RAMI workshop long abstract
Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design
We propose a novel reinforcement learning (RL) design to optimize the charging strategy for autonomous mobile robots in large-scale block stacking warehouses. RL design involves a wide array of choices that can mostly only be evaluated through lengthy experimentation. Our study focuses on how different reward and action space configurations, ranging from flexible setups to more guided, domain-informed design configurations, affect the agent performance. Using heuristic charging strategies as a baseline, we demonstrate the superiority of flexible, RL-based approaches in terms of service times. Furthermore, our findings highlight a trade-off: While more open-ended designs are able to discover well-performing strategies on their own, they may require longer convergence times and are less stable, whereas guided configurations lead to a more stable learning process but display a more limited generalization potential. Our contributions are threefold. First, we extend SLAPStack, an open-source, RL-compatible simulation-framework to accommodate charging strategies. Second, we introduce a novel RL design for tackling the charging strategy problem. Finally, we introduce several novel adaptive baseline heuristics and reproducibly evaluate the design using a Proximal Policy Optimization agent and varying different design configurations, with a focus on reward.
comment: Under review LION19: The 19th Learning and Intelligent OptimizatioN Conference
Conditioning Matters: Training Diffusion Policies is Faster Than You Think
Diffusion policies have emerged as a mainstream paradigm for building vision-language-action (VLA) models. Although they demonstrate strong robot control capabilities, their training efficiency remains suboptimal. In this work, we identify a fundamental challenge in conditional diffusion policy training: when generative conditions are hard to distinguish, the training objective degenerates into modeling the marginal action distribution, a phenomenon we term loss collapse. To overcome this, we propose Cocos, a simple yet general solution that modifies the source distribution in the conditional flow matching to be condition-dependent. By anchoring the source distribution around semantics extracted from condition inputs, Cocos encourages stronger condition integration and prevents the loss collapse. We provide theoretical justification and extensive empirical results across simulation and real-world benchmarks. Our method achieves faster convergence and higher success rates than existing approaches, matching the performance of large-scale pre-trained VLAs using significantly fewer gradient steps and parameters. Cocos is lightweight, easy to implement, and compatible with diverse policy architectures, offering a general-purpose improvement to diffusion policy training.
comment: arXiv admin note: substantial text overlap with arXiv:2505.10105
Planar Velocity Estimation for Fast-Moving Mobile Robots Using Event-Based Optical Flow
Accurate velocity estimation is critical in mobile robotics, particularly for driver assistance systems and autonomous driving. Wheel odometry fused with Inertial Measurement Unit (IMU) data is a widely used method for velocity estimation; however, it typically requires strong assumptions, such as non-slip steering, or complex vehicle dynamics models that do not hold under varying environmental conditions like slippery surfaces. We introduce an approach to velocity estimation that is decoupled from wheel-to-surface traction assumptions by leveraging planar kinematics in combination with optical flow from event cameras pointed perpendicularly at the ground. The asynchronous micro-second latency and high dynamic range of event cameras make them highly robust to motion blur, a common challenge in vision-based perception techniques for autonomous driving. The proposed method is evaluated through in-field experiments on a 1:10 scale autonomous racing platform and compared to precise motion capture data, demonstrating not only performance on par with the state-of-the-art Event-VIO method but also a 38.3 % improvement in lateral error. Qualitative experiments at highway speeds of up to 32 m/s further confirm the effectiveness of our approach, indicating significant potential for real-world deployment.
PARSEC: Preference Adaptation for Robotic Object Rearrangement from Scene Context
Object rearrangement is a key task for household robots requiring personalization without explicit instructions, meaningful object placement in environments occupied with objects, and generalization to unseen objects and new environments. To facilitate research addressing these challenges, we introduce PARSEC, an object rearrangement benchmark for learning user organizational preferences from observed scene context to place objects in a partially arranged environment. PARSEC is built upon a novel dataset of 110K rearrangement examples crowdsourced from 72 users, featuring 93 object categories and 15 environments. We also propose ContextSortLM, an LLM-based rearrangement model that places objects in partially arranged environments by adapting to user preferences from prior and current scene context while accounting for multiple valid placements. We evaluate ContextSortLM and existing personalized rearrangement approaches on the PARSEC benchmark and complement these findings with a crowdsourced evaluation of 108 online raters ranking model predictions based on alignment with user preferences. Our results indicate that personalized rearrangement models leveraging multiple scene context sources perform better than models relying on a single context source. Moreover, ContextSortLM outperforms other models in placing objects to replicate the target user's arrangement and ranks among the top two in all three environment categories, as rated by online evaluators. Importantly, our evaluation highlights challenges associated with modeling environment semantics across different environment categories and provides recommendations for future work.
comment: Under review at ROMAN 2025
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation. Therefore, we propose DexGarmentLab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation, which features large-scale high-quality 3D assets for 15 task scenarios, and refines simulation techniques tailored for garment modeling to reduce the sim-to-real gap. Previous data collection typically relies on teleoperation or training expert reinforcement learning (RL) policies, which are labor-intensive and inefficient. In this paper, we leverage garment structural correspondence to automatically generate a dataset with diverse trajectories using only a single expert demonstration, significantly reducing manual intervention. However, even extensive demonstrations cannot cover the infinite states of garments, which necessitates the exploration of new algorithms. To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO). It first identifies transferable affordance points to accurately locate the manipulation area, then generates generalizable trajectories to complete the task. Through extensive experiments and detailed analysis of our method and baseline, we demonstrate that HALO consistently outperforms existing methods, successfully generalizing to previously unseen instances even with significant variations in shape and deformation where others fail. Our project page is available at: https://wayrise.github.io/DexGarmentLab/.
GROQLoco: Generalist and RObot-agnostic Quadruped Locomotion Control using Offline Datasets
Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GROQLoco, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors - stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) - collected across multiple quadruped robots, to train a generalist model that enables behavior fusion for both behaviors. Crucially, our framework operates directly on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate strong zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results highlight the potential for robust generalist locomotion across diverse robots and terrains.
comment: 18pages, 16figures, 6tables
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
GrowSplat: Constructing Temporal Digital Twins of Plants with Gaussian Splats
Accurate temporal reconstructions of plant growth are essential for plant phenotyping and breeding, yet remain challenging due to complex geometries, occlusions, and non-rigid deformations of plants. We present a novel framework for building temporal digital twins of plants by combining 3D Gaussian Splatting with a robust sample alignment pipeline. Our method begins by reconstructing Gaussian Splats from multi-view camera data, then leverages a two-stage registration approach: coarse alignment through feature-based matching and Fast Global Registration, followed by fine alignment with Iterative Closest Point. This pipeline yields a consistent 4D model of plant development in discrete time steps. We evaluate the approach on data from the Netherlands Plant Eco-phenotyping Center, demonstrating detailed temporal reconstructions of Sequoia and Quinoa species. Videos and Images can be seen at https://berkeleyautomation.github.io/GrowSplat/
Unleashing Humanoid Reaching Potential via Real-world-Ready Skill Space
Humans possess a large reachable space in the 3D world, enabling interaction with objects at varying heights and distances. However, realizing such large-space reaching on humanoids is a complex whole-body control problem and requires the robot to master diverse skills simultaneously-including base positioning and reorientation, height and body posture adjustments, and end-effector pose control. Learning from scratch often leads to optimization difficulty and poor sim2real transferability. To address this challenge, we propose Real-world-Ready Skill Space (R2S2). Our approach begins with a carefully designed skill library consisting of real-world-ready primitive skills. We ensure optimal performance and robust sim2real transfer through individual skill tuning and sim2real evaluation. These skills are then ensembled into a unified latent space, serving as a structured prior that helps task execution in an efficient and sim2real transferable manner. A high-level planner, trained to sample skills from this space, enables the robot to accomplish real-world goal-reaching tasks. We demonstrate zero-shot sim2real transfer and validate R2S2 in multiple challenging goal-reaching scenarios.
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND's reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4x in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks, beating baselines by 2x in simulation and improving real-world pretrained bimanual policies by 5x, taking a step towards scalable, real-world robot learning. See website at https://rewind-reward.github.io/.
Estimating Deformable-Rigid Contact Interactions for a Deformable Tool via Learning and Model-Based Optimization
Dexterous manipulation requires careful reasoning over extrinsic contacts. The prevalence of deforming tools in human environments, the use of deformable sensors, and the increasing number of soft robots yields a need for approaches that enable dexterous manipulation through contact reasoning where not all contacts are well characterized by classical rigid body contact models. Here, we consider the case of a deforming tool dexterously manipulating a rigid object. We propose a hybrid learning and first-principles approach to the modeling of simultaneous motion and force transfer of tools and objects. The learned module is responsible for jointly estimating the rigid object's motion and the deformable tool's imparted contact forces. We then propose a Contact Quadratic Program to recover forces between the environment and object subject to quasi-static equilibrium and Coulomb friction. The results is a system capable of modeling both intrinsic and extrinsic motions, contacts, and forces during dexterous deformable manipulation. We train our method in simulation and show that our method outperforms baselines under varying block geometries and physical properties, during pushing and pivoting manipulations, and demonstrate transfer to real world interactions. Video results can be found at https://deform-rigid-contact.github.io/.
comment: 8 pages. IEEE Robotics and Automation Letters, 2025
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
Robot task planning decomposes human instructions into executable action sequences that enable robots to complete a series of complex tasks. Although recent large language model (LLM)-based task planners achieve amazing performance, they assume that human instructions are clear and straightforward. However, real-world users are not experts, and their instructions to robots often contain significant vagueness. Linguists suggest that such vagueness frequently arises from referring expressions (REs), whose meanings depend heavily on dialogue context and environment. This vagueness is even more prevalent among the elderly and children, who robots should serve more. This paper studies how such vagueness in REs within human instructions affects LLM-based robot task planning and how to overcome this issue. To this end, we propose the first robot task planning benchmark with vague REs (REI-Bench), where we discover that the vagueness of REs can severely degrade robot planning performance, leading to success rate drops of up to 77.9%. We also observe that most failure cases stem from missing objects in planners. To mitigate the REs issue, we propose a simple yet effective approach: task-oriented context cognition, which generates clear instructions for robots, achieving state-of-the-art performance compared to aware prompt and chains of thought. This work contributes to the research community of human-robot interaction (HRI) by making robot task planning more practical, particularly for non-expert users, e.g., the elderly and children.
comment: Submitted to CoRL 2025, under review
Robust 2D lidar-based SLAM in arboreal environments without IMU/GNSS
Simultaneous localization and mapping (SLAM) approaches for mobile robots remains challenging in forest or arboreal fruit farming environments, where tree canopies obstruct Global Navigation Satellite Systems (GNSS) signals. Unlike indoor settings, these agricultural environments possess additional challenges due to outdoor variables such as foliage motion and illumination variability. This paper proposes a solution based on 2D lidar measurements, which requires less processing and storage, and is more cost-effective, than approaches that employ 3D lidars. Utilizing the modified Hausdorff distance (MHD) metric, the method can solve the scan matching robustly and with high accuracy without needing sophisticated feature extraction. The method's robustness was validated using public datasets and considering various metrics, facilitating meaningful comparisons for future research. Comparative evaluations against state-of-the-art algorithms, particularly A-LOAM, show that the proposed approach achieves lower positional and angular errors while maintaining higher accuracy and resilience in GNSS-denied settings. This work contributes to the advancement of precision agriculture by enabling reliable and autonomous navigation in challenging outdoor environments.
mmMirror: Device Free mmWave Indoor NLoS Localization Using Van-Atta-Array IRS
Industry 4.0 is transforming manufacturing and logistics by integrating robots into shared human environments, such as factories, warehouses, and healthcare facilities. However, the risk of human-robot collisions, especially in Non-Line-of-Sight (NLoS) scenarios like around corners, remains a critical challenge. Existing solutions, such as vision-based and LiDAR systems, often fail under occlusion, lighting constraints, or privacy concerns, while RF-based systems are limited by range and accuracy. To address these limitations, we propose mmMirror, a novel system leveraging a Van Atta Array-based millimeter-wave (mmWave) reconfigurable intelligent reflecting surface (IRS) for precise, device-free NLoS localization. mmMirror integrates seamlessly with existing frequency-modulated continuous-wave (FMCW) radars and offers: (i) robust NLoS localization with centimeter-level accuracy at ranges up to 3 m, (ii) seamless uplink and downlink communication between radar and IRS, (iii) support for multi-radar and multi-target scenarios via dynamic beam steering, and (iv) reduced scanning latency through adaptive time slot allocation. Implemented using commodity 24 GHz radars and a PCB-based IRS prototype, mmMirror demonstrates its potential in enabling safe human-robot interactions in dynamic and complex environments.
Geofenced Unmanned Aerial Robotic Defender for Deer Detection and Deterrence (GUARD) ICRA
Wildlife-induced crop damage, particularly from deer, threatens agricultural productivity. Traditional deterrence methods often fall short in scalability, responsiveness, and adaptability to diverse farmland environments. This paper presents an integrated unmanned aerial vehicle (UAV) system designed for autonomous wildlife deterrence, developed as part of the Farm Robotics Challenge. Our system combines a YOLO-based real-time computer vision module for deer detection, an energy-efficient coverage path planning algorithm for efficient field monitoring, and an autonomous charging station for continuous operation of the UAV. In collaboration with a local Minnesota farmer, the system is tailored to address practical constraints such as terrain, infrastructure limitations, and animal behavior. The solution is evaluated through a combination of simulation and field testing, demonstrating robust detection accuracy, efficient coverage, and extended operational time. The results highlight the feasibility and effectiveness of drone-based wildlife deterrence in precision agriculture, offering a scalable framework for future deployment and extension.
comment: Accepted to the Novel Approaches for Precision Agriculture and Forestry with Autonomous Robots IEEE ICRA Workshop - 2025
Counterfactual Behavior Cloning: Offline Imitation Learning from Imperfect Human Demonstrations
Learning from humans is challenging because people are imperfect teachers. When everyday humans show the robot a new task they want it to perform, humans inevitably make errors (e.g., inputting noisy actions) and provide suboptimal examples (e.g., overshooting the goal). Existing methods learn by mimicking the exact behaviors the human teacher provides -- but this approach is fundamentally limited because the demonstrations themselves are imperfect. In this work we advance offline imitation learning by enabling robots to extrapolate what the human teacher meant, instead of only considering what the human actually showed. We achieve this by hypothesizing that all of the human's demonstrations are trying to convey a single, consistent policy, while the noise and sub-optimality within their behaviors obfuscates the data and introduces unintentional complexity. To recover the underlying policy and learn what the human teacher meant, we introduce Counter-BC, a generalized version of behavior cloning. Counter-BC expands the given dataset to include actions close to behaviors the human demonstrated (i.e., counterfactual actions that the human teacher could have intended, but did not actually show). During training Counter-BC autonomously modifies the human's demonstrations within this expanded region to reach a simple and consistent policy that explains the underlying trends in the human's dataset. Theoretically, we prove that Counter-BC can extract the desired policy from imperfect data, multiple users, and teachers of varying skill levels. Empirically, we compare Counter-BC to state-of-the-art alternatives in simulated and real-world settings with noisy demonstrations, standardized datasets, and real human teachers. See videos of our work here: https://youtu.be/XaeOZWhTt68
Generalizable Vision-Language Few-Shot Adaptation with Predictive Prompts and Negative Learning
Few-shot adaptation remains a core challenge for vision-language models (VLMs), especially under limited supervision and noisy support samples. We propose PromptFuseNL, a unified framework that enhances few-shot generalization by combining predictive prompt tuning with dual-branch positive and negative learning. The method refines class prototypes through task-conditioned residuals, multi-stage cross-modal coordination, and semantic hard negative mining. To address label noise, we introduce an unsupervised instance reweighting strategy that downweights unreliable support examples without requiring additional labels or structural changes. PromptFuseNL fuses visual and textual cues through lightweight modules for efficient and discriminative prediction. Evaluated across 15 benchmarks, it consistently surpasses existing prompt- and adapter-based methods in all shot settings while remaining highly efficient, achieving up to 300x faster training and 1000x lower FLOPs compared to full prompt tuning, achieving a new state-of-the-art for robust and scalable few-shot vision-language adaptation.
Reachability Barrier Networks: Learning Hamilton-Jacobi Solutions for Smooth and Flexible Control Barrier Functions
Recent developments in autonomous driving and robotics underscore the necessity of safety-critical controllers. Control barrier functions (CBFs) are a popular method for appending safety guarantees to a general control framework, but they are notoriously difficult to generate beyond low dimensions. Existing methods often yield non-differentiable or inaccurate approximations that lack integrity, and thus fail to ensure safety. In this work, we use physics-informed neural networks (PINNs) to generate smooth approximations of CBFs by computing Hamilton-Jacobi (HJ) optimal control solutions. These reachability barrier networks (RBNs) avoid traditional dimensionality constraints and support the tuning of their conservativeness post-training through a parameterized discount term. To ensure robustness of the discounted solutions, we leverage conformal prediction methods to derive probabilistic safety guarantees for RBNs. We demonstrate that RBNs are highly accurate in low dimensions, and safer than the standard neural CBF approach in high dimensions. Namely, we showcase the RBNs in a 9D multi-vehicle collision avoidance problem where it empirically proves to be 5.5x safer and 1.9x less conservative than the neural CBFs, offering a promising method to synthesize CBFs for general nonlinear autonomous systems.
comment: 15 pages, 7 figures
Zero-Shot Visual Generalization in Robot Manipulation
Training vision-based manipulation policies that are robust across diverse visual environments remains an important and unresolved challenge in robot learning. Current approaches often sidestep the problem by relying on invariant representations such as point clouds and depth, or by brute-forcing generalization through visual domain randomization and/or large, visually diverse datasets. Disentangled representation learning - especially when combined with principles of associative memory - has recently shown promise in enabling vision-based reinforcement learning policies to be robust to visual distribution shifts. However, these techniques have largely been constrained to simpler benchmarks and toy environments. In this work, we scale disentangled representation learning and associative memory to more visually and dynamically complex manipulation tasks and demonstrate zero-shot adaptability to visual perturbations in both simulation and on real hardware. We further extend this approach to imitation learning, specifically Diffusion Policy, and empirically show significant gains in visual generalization compared to state-of-the-art imitation learning methods. Finally, we introduce a novel technique adapted from the model equivariance literature that transforms any trained neural network policy into one invariant to 2D planar rotations, making our policy not only visually robust but also resilient to certain camera perturbations. We believe that this work marks a significant step towards manipulation policies that are not only adaptable out of the box, but also robust to the complexities and dynamical nature of real-world deployment. Supplementary videos are available at https://sites.google.com/view/vis-gen-robotics/home.
Employing Laban Shape for Generating Emotionally and Functionally Expressive Trajectories in Robotic Manipulators
Successful human-robot collaboration depends on cohesive communication and a precise understanding of the robot's abilities, goals, and constraints. While robotic manipulators offer high precision, versatility, and productivity, they exhibit expressionless and monotonous motions that conceal the robot's intention, resulting in a lack of efficiency and transparency with humans. In this work, we use Laban notation, a dance annotation language, to enable robotic manipulators to generate trajectories with functional expressivity, where the robot uses nonverbal cues to communicate its abilities and the likelihood of succeeding at its task. We achieve this by introducing two novel variants of Hesitant expressive motion (Spoke-Like and Arc-Like). We also enhance the emotional expressivity of four existing emotive trajectories (Happy, Sad, Shy, and Angry) by augmenting Laban Effort usage with Laban Shape. The functionally expressive motions are validated via a human-subjects study, where participants equate both variants of Hesitant motion with reduced robot competency. The enhanced emotive trajectories are shown to be viewed as distinct emotions using the Valence-Arousal-Dominance (VAD) spectrum, corroborating the usage of Laban Shape.
comment: Under review for the 2025 IEEE RO-MAN Conference
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation. One appealing option is egocentric human video, a passively scalable data source. However, existing large-scale datasets such as Ego4D do not have native hand pose annotations and do not focus on object manipulation. To this end, we use Apple Vision Pro to collect EgoDex: the largest and most diverse dataset of dexterous human manipulation to date. EgoDex has 829 hours of egocentric video with paired 3D hand and finger tracking data collected at the time of recording, where multiple calibrated cameras and on-device SLAM can be used to precisely track the pose of every joint of each hand. The dataset covers a wide range of diverse manipulation behaviors with everyday household objects in 194 different tabletop tasks ranging from tying shoelaces to folding laundry. Furthermore, we train and systematically evaluate imitation learning policies for hand trajectory prediction on the dataset, introducing metrics and benchmarks for measuring progress in this increasingly important area. By releasing this large-scale dataset, we hope to push the frontier of robotics, computer vision, and foundation models.
Grounded Task Axes: Zero-Shot Semantic Skill Generalization via Task-Axis Controllers and Visual Foundation Models
Transferring skills between different objects remains one of the core challenges of open-world robot manipulation. Generalization needs to take into account the high-level structural differences between distinct objects while still maintaining similar low-level interaction control. In this paper, we propose an example-based zero-shot approach to skill transfer. Rather than treating skills as atomic, we decompose skills into a prioritized list of grounded task-axis (GTA) controllers. Each GTAC defines an adaptable controller, such as a position or force controller, along an axis. Importantly, the GTACs are grounded in object key points and axes, e.g., the relative position of a screw head or the axis of its shaft. Zero-shot transfer is thus achieved by finding semantically-similar grounding features on novel target objects. We achieve this example-based grounding of the skills through the use of foundation models, such as SD-DINO, that can detect semantically similar keypoints of objects. We evaluate our framework on real-robot experiments, including screwing, pouring, and spatula scraping tasks, and demonstrate robust and versatile controller transfer for each.
Adaptive Ergodic Search with Energy-Aware Scheduling for Persistent Multi-Robot Missions
Autonomous robots are increasingly deployed for long-term information-gathering tasks, which pose two key challenges: planning informative trajectories in environments that evolve across space and time, and ensuring persistent operation under energy constraints. This paper presents a unified framework, mEclares, that addresses both challenges through adaptive ergodic search and energy-aware scheduling in multi-robot systems. Our contributions are two-fold: (1) we model real-world variability using stochastic spatiotemporal environments, where the underlying information evolves unpredictably due to process uncertainty. To guide exploration, we construct a target information spatial distribution (TISD) based on clarity, a metric that captures the decay of information in the absence of observations and highlights regions of high uncertainty; and (2) we introduce Robustmesch (Rmesch), an online scheduling method that enables persistent operation by coordinating rechargeable robots sharing a single mobile charging station. Unlike prior work, our approach avoids reliance on preplanned schedules, static or dedicated charging stations, and simplified robot dynamics. Instead, the scheduler supports general nonlinear models, accounts for uncertainty in the estimated position of the charging station, and handles central node failures. The proposed framework is validated through real-world hardware experiments, and feasibility guarantees are provided under specific assumptions.
comment: Under review at Autonomous Robots
Monotone Subsystem Decomposition for Efficient Multi-Objective Robot Design ICRA
Automating design minimizes errors, accelerates the design process, and reduces cost. However, automating robot design is challenging due to recursive constraints, multiple design objectives, and cross-domain design complexity possibly spanning multiple abstraction layers. Here we look at the problem of component selection, a combinatorial optimization problem in which a designer, given a robot model, must select compatible components from an extensive catalog. The goal is to satisfy high-level task specifications while optimally balancing trade-offs between competing design objectives. In this paper, we extend our previous constraint programming approach to multi-objective design problems and propose the novel technique of monotone subsystem decomposition to efficiently compute a Pareto front of solutions for large-scale problems. We prove that subsystems can be optimized for their Pareto fronts and, under certain conditions, these results can be used to determine a globally optimal Pareto front. Furthermore, subsystems serve as an intuitive design abstraction and can be reused across various design problems. Using an example quadcopter design problem, we compare our method to a linear programming approach and demonstrate our method scales better for large catalogs, solving a multi-objective problem of 10^25 component combinations in seconds. We then expand the original problem and solve a task-oriented, multi-objective design problem to build a fleet of quadcopters to deliver packages. We compute a Pareto front of solutions in seconds where each solution contains an optimal component-level design and an optimal package delivery schedule for each quadcopter.
comment: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2025
Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization ICRA 2025
Ground texture localization using a downward-facing camera offers a low-cost, high-precision localization solution that is robust to dynamic environments and requires no environmental modification. We present a significantly improved bag-of-words (BoW) image retrieval system for ground texture localization, achieving substantially higher accuracy for global localization and higher precision and recall for loop closure detection in SLAM. Our approach leverages an approximate $k$-means (AKM) vocabulary with soft assignment, and exploits the consistent orientation and constant scale constraints inherent to ground texture localization. Identifying the different needs of global localization vs. loop closure detection for SLAM, we present both high-accuracy and high-speed versions of our algorithm. We test the effect of each of our proposed improvements through an ablation study and demonstrate our method's effectiveness for both global localization and loop closure detection. With numerous ground texture localization systems already using BoW, our method can readily replace other generic BoW systems in their pipeline and immediately improve their results.
comment: Accepted to ICRA 2025
Mechanically Programming the Cross-Sectional Shape of Soft Growing Robotic Structures for Patient Transfer
Pneumatic soft everting robotic structures have the potential to facilitate human transfer tasks due to their ability to grow underneath humans without sliding friction and their utility as a flexible sling when deflated. Tubular structures naturally yield circular cross-sections when inflated, whereas a robotic sling must be both thin enough to grow between them and their resting surface and wide enough to cradle the human. Recent works have achieved flattened cross-sections by including rigid components into the structure, but this reduces conformability to the human. We present a method of mechanically programming the cross-section of soft everting robotic structures using flexible strips that constrain radial expansion between points along the outer membrane. Our method enables simultaneously wide and thin profiles while maintaining the full multi-axis flexibility of traditional slings. We develop and validate a model relating the geometric design specifications to the fabrication parameters, and experimentally characterize their effects on growth rate. Finally, we prototype a soft growing robotic sling system and demonstrate its use for assisting a single caregiver in bed-to-chair patient transfer.
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
Mimicry is a fundamental learning mechanism in humans, enabling individuals to learn new tasks by observing and imitating experts. However, applying this ability to robots presents significant challenges due to the inherent differences between human and robot embodiments in both their visual appearance and physical capabilities. While previous methods bridge this gap using cross-embodiment datasets with shared scenes and tasks, collecting such aligned data between humans and robots at scale is not trivial. In this paper, we propose UniSkill, a novel framework that learns embodiment-agnostic skill representations from large-scale cross-embodiment video data without any labels, enabling skills extracted from human video prompts to effectively transfer to robot policies trained only on robot data. Our experiments in both simulation and real-world environments show that our cross-embodiment skills successfully guide robots in selecting appropriate actions, even with unseen video prompts. The project website can be found at: https://kimhanjung.github.io/UniSkill.
comment: Project Page: https://kimhanjung.github.io/UniSkill/
Grasp EveryThing (GET): 1-DoF, 3-Fingered Gripper with Tactile Sensing for Robust Grasping
We introduce the Grasp EveryThing (GET) gripper, a novel 1-DoF, 3-finger design for securely grasping objects of many shapes and sizes. Mounted on a standard parallel jaw actuator, the design features three narrow, tapered fingers arranged in a two-against-one configuration, where the two fingers converge into a V-shape. The GET gripper is more capable of conforming to object geometries and forming secure grasps than traditional designs with two flat fingers. Inspired by the principle of self-similarity, these V-shaped fingers enable secure grasping across a wide range of object sizes. Further to this end, fingers are parametrically designed for convenient resizing and interchangeability across robotic embodiments with a parallel jaw gripper. Additionally, we incorporate a rigid fingernail to enhance small object manipulation. Tactile sensing can be integrated into the standalone finger via an externally-mounted camera. A neural network was trained to estimate normal force from tactile images with an average validation error of 1.3~N across a diverse set of geometries. In grasping 15 objects and performing 3 tasks via teleoperation, the GET fingers consistently outperformed standard flat fingers. Finger designs for use with multiple robotic embodiments are available on GitHub.
Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.
comment: 12 pages, 15 figures
Fast and Robust Visuomotor Riemannian Flow Matching Policy
Diffusion-based visuomotor policies excel at learning complex robotic tasks by effectively combining visual data with high-dimensional, multi-modal action distributions. However, diffusion models often suffer from slow inference due to costly denoising processes or require complex sequential training arising from recent distilling approaches. This paper introduces Riemannian Flow Matching Policy (RFMP), a model that inherits the easy training and fast inference capabilities of flow matching (FM). Moreover, RFMP inherently incorporates geometric constraints commonly found in realistic robotic applications, as the robot state resides on a Riemannian manifold. To enhance the robustness of RFMP, we propose Stable RFMP (SRFMP), which leverages LaSalle's invariance principle to equip the dynamics of FM with stability to the support of a target Riemannian distribution. Rigorous evaluation on eight simulated and real-world tasks show that RFMP successfully learns and synthesizes complex sensorimotor policies on Euclidean and Riemannian spaces with efficient training and inference phases, outperforming Diffusion Policies and Consistency Policies.
comment: 17 pages, 12 figures, 12 tables, project website: https://sites.google.com/view/rfmp
Automating High Quality RT Planning at Scale
Radiotherapy (RT) planning is complex, subjective, and time-intensive. Advances with artificial intelligence (AI) promise to improve its precision and efficiency, but progress is often limited by the scarcity of large, standardized datasets. To address this, we introduce the Automated Iterative RT Planning (AIRTP) system, a scalable solution for generating high-quality treatment plans. This scalable solution is designed to generate substantial volumes of consistently high-quality treatment plans, overcoming a key obstacle in the advancement of AI-driven RT planning. Our AIRTP pipeline adheres to clinical guidelines and automates essential steps, including organ-at-risk (OAR) contouring, helper structure creation, beam setup, optimization, and plan quality improvement, using AI integrated with RT planning software like Varian Eclipse. Furthermore, a novel approach for determining optimization parameters to reproduce 3D dose distributions, i.e. a method to convert dose predictions to deliverable treatment plans constrained by machine limitations is proposed. A comparative analysis of plan quality reveals that our automated pipeline produces treatment plans of quality comparable to those generated manually, which traditionally require several hours of labor per plan. Committed to public research, the first data release of our AIRTP pipeline includes nine cohorts covering head-and-neck and lung cancer sites to support an AAPM 2025 challenge. To our best knowledge, this dataset features more than 10 times number of plans compared to the largest existing well-curated public dataset. Repo: https://github.com/RiqiangGao/GDP-HMM_AAPMChallenge.
comment: radiotherapy planning, data for AI training
Should Collaborative Robots be Transparent?
We often assume that robots which collaborate with humans should behave in ways that are transparent (e.g., legible, explainable). These transparent robots intentionally choose actions that convey their internal state to nearby humans: for instance, a transparent robot might exaggerate its trajectory to indicate its goal. But while transparent behavior seems beneficial for human-robot interaction, is it actually optimal? In this paper we consider collaborative settings where the human and robot have the same objective, and the human is uncertain about the robot's type (i.e., the robot's internal state). We extend a recursive combination of Bayesian Nash equilibrium and the Bellman equation to solve for optimal robot policies. Interestingly, we discover that it is not always optimal for collaborative robots to be transparent; instead, human and robot teams can sometimes achieve higher rewards when the robot is opaque. In contrast to transparent robots, opaque robots select actions that withhold information from the human. Our analysis suggests that opaque behavior becomes optimal when either (a) human-robot interactions have a short time horizon or (b) users are slow to learn from the robot's actions. We extend this theoretical analysis to user studies across 43 total participants in both online and in-person settings. We find that -- during short interactions -- users reach higher rewards when working with opaque partners, and subjectively rate opaque robots as about equal to transparent robots. See videos of our experiments here: https://youtu.be/u8q1Z7WHUuI
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient ICLR 2025
Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often requires complex and deep architectures, which are computationally expensive and challenging to train. Within the world model, sequence models play a critical role in accurate predictions, and various architectures have been explored, each with its own challenges. Currently, recurrent neural network (RNN)-based world models struggle with vanishing gradients and capturing long-term dependencies. Transformers, on the other hand, suffer from the quadratic memory and computational complexity of self-attention mechanisms, scaling as $O(n^2)$, where $n$ is the sequence length. To address these challenges, we propose a state space model (SSM)-based world model, Drama, specifically leveraging Mamba, that achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies and enabling efficient training with longer sequences. We also introduce a novel sampling method to mitigate the suboptimality caused by an incorrect world model in the early training stages. Combining these techniques, Drama achieves a normalised score on the Atari100k benchmark that is competitive with other state-of-the-art (SOTA) model-based RL algorithms, using only a 7 million-parameter world model. Drama is accessible and trainable on off-the-shelf hardware, such as a standard laptop. Our code is available at https://github.com/realwenlongwang/Drama.git.
comment: Published as a conference paper at ICLR 2025
Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features
End-to-end learning directly maps sensory inputs to actions, creating highly integrated and efficient policies for complex robotics tasks. However, such models often struggle to generalize beyond their training scenarios, limiting adaptability to new environments, tasks, and concepts. In this work, we investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies under unseen text instructions and visual distribution shifts. Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors, generating spatially aware embeddings that integrate semantic and visual information. We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning on a small simulated dataset successfully generalize to real-world scenes with diverse novel goals and command formulations.
VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction
Next Best View (NBV) algorithms aim to acquire an optimal set of images using minimal resources, time, or number of captures to enable efficient 3D reconstruction of a scene. Existing approaches often rely on prior scene knowledge or additional image captures and often develop policies that maximize coverage. Yet, for many real scenes with complex geometry and self-occlusions, coverage maximization does not lead to better reconstruction quality directly. In this paper, we propose the View Introspection Network (VIN), which is trained to predict the reconstruction quality improvement of views directly, and the VIN-NBV policy. A greedy sequential sampling-based policy, where at each acquisition step, we sample multiple query views and choose the one with the highest VIN predicted improvement score. We design the VIN to perform 3D-aware featurization of the reconstruction built from prior acquisitions, and for each query view create a feature that can be decoded into an improvement score. We then train the VIN using imitation learning to predict the reconstruction improvement score. We show that VIN-NBV improves reconstruction quality by ~30% over a coverage maximization baseline when operating with constraints on the number of acquisitions or the time in motion.
comment: The paper has not gone through legal review. We will update it with new version once the review is complete
MetaSym: A Symplectic Meta-learning Framework for Physical Intelligence
Scalable and generalizable physics-aware deep learning has long been considered a significant challenge with various applications across diverse domains ranging from robotics to molecular dynamics. Central to almost all physical systems are symplectic forms, the geometric backbone that underpins fundamental invariants like energy and momentum. In this work, we introduce a novel deep learning framework, MetaSym. In particular, MetaSym combines a strong symplectic inductive bias obtained from a symplectic encoder, and an autoregressive decoder with meta-attention. This principled design ensures that core physical invariants remain intact, while allowing flexible, data-efficient adaptation to system heterogeneities. We benchmark MetaSym with highly varied and realistic datasets, such as a high-dimensional spring-mesh system (Otness et al., 2021), an open quantum system with dissipation and measurement backaction, and robotics-inspired quadrotor dynamics. Our results demonstrate superior performance in modeling dynamics under few-shot adaptation, outperforming state-of-the-art baselines that use larger models.
comment: 10 + 11 pages, 5 figures, 8 tables
Radiance Fields for Robotic Teleoperation IROS 2024
Radiance field methods such as Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS), have revolutionized graphics and novel view synthesis. Their ability to synthesize new viewpoints with photo-realistic quality, as well as capture complex volumetric and specular scenes, makes them an ideal visualization for robotic teleoperation setups. Direct camera teleoperation provides high-fidelity operation at the cost of maneuverability, while reconstruction-based approaches offer controllable scenes with lower fidelity. With this in mind, we propose replacing the traditional reconstruction-visualization components of the robotic teleoperation pipeline with online Radiance Fields, offering highly maneuverable scenes with photorealistic quality. As such, there are three main contributions to state of the art: (1) online training of Radiance Fields using live data from multiple cameras, (2) support for a variety of radiance methods including NeRF and 3DGS, (3) visualization suite for these methods including a virtual reality scene. To enable seamless integration with existing setups, these components were tested with multiple robots in multiple configurations and were displayed using traditional tools as well as the VR headset. The results across methods and robots were compared quantitatively to a baseline of mesh reconstruction, and a user study was conducted to compare the different visualization methods. For videos and code, check out https://rffr.leggedrobotics.com/works/teleoperation/.
comment: 8 pages, 10 figures, Accepted to IROS 2024
Distilling Contact Planning for Fast Trajectory Optimization in Robot Air Hockey
Robot control through contact is challenging as it requires reasoning over long horizons and discontinuous system dynamics. Highly dynamic tasks such as Air Hockey additionally require agile behavior, making the corresponding optimal control problems intractable for planning in realtime. Learning-based approaches address this issue by shifting computationally expensive reasoning through contacts to an offline learning phase. However, learning low-level motor policies subject to kinematic and dynamic constraints can be challenging if operating in proximity to such constraints is desired. This paper explores the combination of distilling a stochastic optimal control policy for high-level contact planning and online model-predictive control for low-level constrained motion planning. Our system learns to balance shooting accuracy and resulting puck speed by leveraging bank shots and the robot's kinematic structure. We show that the proposed framework outperforms purely control-based and purely learning-based techniques in both simulated and real-world games of Robot Air Hockey.
comment: Robotics: Science and Systems 2025
RGB-Event Fusion with Self-Attention for Collision Prediction
Ensuring robust and real-time obstacle avoidance is critical for the safe operation of autonomous robots in dynamic, real-world environments. This paper proposes a neural network framework for predicting the time and collision position of an unmanned aerial vehicle with a dynamic object, using RGB and event-based vision sensors. The proposed architecture consists of two separate encoder branches, one for each modality, followed by fusion by self-attention to improve prediction accuracy. To facilitate benchmarking, we leverage the ABCD [8] dataset collected that enables detailed comparisons of single-modality and fusion-based approaches. At the same prediction throughput of 50Hz, the experimental results show that the fusion-based model offers an improvement in prediction accuracy over single-modality approaches of 1% on average and 10% for distances beyond 0.5m, but comes at the cost of +71% in memory and + 105% in FLOPs. Notably, the event-based model outperforms the RGB model by 4% for position and 26% for time error at a similar computational cost, making it a competitive alternative. Additionally, we evaluate quantized versions of the event-based models, applying 1- to 8-bit quantization to assess the trade-offs between predictive performance and computational efficiency. These findings highlight the trade-offs of multi-modal perception using RGB and event-based cameras in robotic applications.
comment: arXiv admin note: text overlap with arXiv:2504.10400
Satellite Autonomous Clock Fault Monitoring with Inter-Satellite Ranges Using Euclidean Distance Matrices
To address the need for robust positioning, navigation, and timing services in lunar environments, this paper proposes a novel onboard clock phase jump detection framework for satellite constellations using range measurements obtained from dual one-way inter-satellite links. Our approach leverages vertex redundantly rigid graphs to detect faults without relying on prior knowledge of satellite positions or clock biases, providing flexibility for lunar satellite networks with diverse satellite types and operators. We model satellite constellations as graphs, where satellites are vertices and inter-satellite links are edges. The proposed algorithm detects and identifies satellites with clock jumps by monitoring the singular values of the geometric-centered Euclidean distance matrix (GCEDM) of 5-clique sub-graphs. The proposed method is validated through simulations of a GPS constellation and a notional constellation around the Moon, demonstrating its effectiveness in various configurations.
comment: This manuscript was submitted to the NAVIGATION: Journal of the Institute of Navigation
FALCON: Fast Autonomous Aerial Exploration using Coverage Path Guidance
This paper introduces FALCON, a novel Fast Autonomous expLoration framework using COverage path guidaNce, which aims at setting a new performance benchmark in the field of autonomous aerial exploration. Despite recent advancements in the domain, existing exploration planners often suffer from inefficiencies such as frequent revisitations of previously explored regions.FALCON effectively harnesses the full potential of online generated coverage paths in enhancing exploration efficiency.The framework begins with an incremental connectivity-aware space decomposition and connectivity graph construction, which facilitate efficient coverage path planning.Subsequently, a hierarchical planner generates a coverage path spanning the entire unexplored space, serving as a global guidance.Then, a local planner optimizes the frontier visitation order, minimizing traversal time while consciously incorporating the intention of the global guidance.Finally, minimum-time smooth and safe trajectories are produced to visit the frontier viewpoints.For fair and comprehensive benchmark experiments, we introduce a lightweight exploration planner evaluation environment that allows for comparing exploration planners across a variety of testing scenarios using an identical quadrotor simulator.Additionally, an in-depth analysis and evaluation is conducted to highlight the significant performance advantages of FALCON in comparison with the state-of-the-art exploration planners based on objective criteria.Extensive ablation studies demonstrate the effectiveness of each component in the proposed framework.Real-world experiments conducted fully onboard further validate FALCON's practical capability in complex and challenging environments.The source code of both the exploration planner FALCON and the exploration planner evaluation environment has been released to benefit the community.
comment: Published in IEEE T-RO. Open source link: https://github.com/HKUST-Aerial-Robotics/FALCON
Demonstrating a Control Framework for Physical Human-Robot Interaction Toward Industrial Applications
Physical Human-Robot Interaction (pHRI) is critical for implementing Industry 5.0, which focuses on human-centric approaches. However, few studies explore the practical alignment of pHRI to industrial-grade performance. This paper introduces a versatile control framework designed to bridge this gap by incorporating the torque-based control modes: compliance control, null-space compliance, and dual compliance, all in static and dynamic scenarios. Thanks to our second-order Quadratic Programming (QP) formulation, strict kinematic and collision constraints are integrated into the system as safety features, and a weighted hierarchy guarantees singularity-robust task tracking performance. The framework is implemented on a Kinova Gen3 collaborative robot (cobot) equipped with a Bota force/torque sensor. A DualShock 4 game controller is attached to the robot's end-effector to demonstrate the framework's capabilities. This setup enables seamless dynamic switching between the modes, and real-time adjustments of parameters, such as transitioning between position and torque control or selecting a more robust custom-developed low-level torque controller over the default one. Built on the open-source robotic control software mc_rtc, our framework ensures reproducibility for both research and industrial deployment, this framework demonstrates a step toward industrial-grade performance and repeatability, showcasing its potential as a robust pHRI control system for industrial environments.
comment: Demo Paper submitted to Robotics: Science and Systems (RSS2025), accepted
Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark Matching
Accurate robot localization is essential for effective operation. Monte Carlo Localization (MCL) is commonly used with known maps but is computationally expensive due to landmark matching for each particle. Humanoid robots face additional challenges, including sensor noise from locomotion vibrations and a limited field of view (FOV) due to camera placement. This paper proposes a fast and robust localization method via iterative landmark matching (ILM) for humanoid robots. The iterative matching process improves the accuracy of the landmark association so that it does not need MCL to match landmarks to particles. Pose estimation with the outlier removal process enhances its robustness to measurement noise and faulty detections. Furthermore, an additional filter can be utilized to fuse inertial data from the inertial measurement unit (IMU) and pose data from localization. We compared ILM with Iterative Closest Point (ICP), which shows that ILM method is more robust towards the error in the initial guess and easier to get a correct matching. We also compared ILM with the Augmented Monte Carlo Localization (aMCL), which shows that ILM method is much faster than aMCL and even more accurate. The proposed method's effectiveness is thoroughly evaluated through experiments and validated on the humanoid robot ARTEMIS during RoboCup 2024 adult-sized soccer competition.
GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field
Semantic-aware 3D scene reconstruction is essential for autonomous robots to perform complex interactions. Semantic SLAM, an online approach, integrates pose tracking, geometric reconstruction, and semantic mapping into a unified framework, shows significant potential. However, existing systems, which rely on 2D ground truth priors for supervision, are often limited by the sparsity and noise of these signals in real-world environments. To address this challenge, we propose GSFF-SLAM, a novel dense semantic SLAM system based on 3D Gaussian Splatting that leverages feature fields to achieve joint rendering of appearance, geometry, and N-dimensional semantic features. By independently optimizing feature gradients, our method supports semantic reconstruction using various forms of 2D priors, particularly sparse and noisy signals. Experimental results demonstrate that our approach outperforms previous methods in both tracking accuracy and photorealistic rendering quality. When utilizing 2D ground truth priors, GSFF-SLAM achieves state-of-the-art semantic segmentation performance with 95.03\% mIoU, while achieving up to 2.9$\times$ speedup with only marginal performance degradation.
Task Reconstruction and Extrapolation for $π_0$ using Text Latent
Vision-language-action models (VLAs) often achieve high performance on demonstrated tasks but struggle significantly when required to extrapolate, combining skills learned from different tasks in novel ways. For instance, VLAs might successfully put the cream cheese in the bowl and put the bowl on top of the cabinet, yet still fail to put the cream cheese on top of the cabinet. In this work, we demonstrate that behaviors from distinct tasks can be effectively recombined by manipulating the VLA's internal representations at inference time. Concretely, we identify the text latent by averaging the text tokens' hidden states across all demonstrated trajectories for a specific base task. For executing an extrapolated task, we can temporally interpolate the text latent of the two base tasks and add it back to the text hidden states, so sub-behaviors from the two tasks will be activated sequentially. We evaluate this approach using the newly created libero-ood benchmark, featuring 20 tasks extrapolated from standard LIBERO suites. The results on libero-ood show that all SOTA VLAs achieve < 15% success rate, while $\pi0$ with text latent interpolation reaches an 83% success rate. Further qualitative analysis reveals a tendency for VLAs to exhibit spatial overfitting, mapping object names to demonstrated locations rather than achieving genuine object and goal understanding. Additionally, we find that decoding the text latent yields human-unreadable prompts that can nevertheless instruct the VLA to achieve a 70% success rate on standard LIBERO suites, enabling private instruction or backdoor attacks.
Towards Safe and Efficient Through-the-Canopy Autonomous Fruit Counting with UAVs
We present an autonomous aerial system for safe and efficient through-the-canopy fruit counting. Aerial robot applications in large-scale orchards face significant challenges due to the complexity of fine-tuning flight paths based on orchard layouts, canopy density, and plant variability. Through-the-canopy navigation is crucial for minimizing occlusion by leaves and branches but is more challenging due to the complex and dense environment compared to traditional over-the-canopy flights. Our system addresses these challenges by integrating: i) a high-fidelity simulation framework for optimizing flight trajectories, ii) a low-cost autonomy stack for canopy-level navigation and data collection, and iii) a robust workflow for fruit detection and counting using RGB images. We validate our approach through fruit counting with canopy-level aerial images and by demonstrating the autonomous navigation capabilities of our experimental vehicle.
Self-Supervised Learning for Robotic Leaf Manipulation: A Hybrid Geometric-Neural Approach
Automating leaf manipulation in agricultural settings faces significant challenges, including the variability of plant morphologies and deformable leaves. We propose a novel hybrid geometric-neural approach for autonomous leaf grasping that combines traditional computer vision with neural networks through self-supervised learning. Our method integrates YOLOv8 for instance segmentation and RAFT-Stereo for 3D depth estimation to build rich leaf representations, which feed into both a geometric feature scoring pipeline and a neural refinement module (GraspPointCNN). The key innovation is our confidence-weighted fusion mechanism that dynamically balances the contribution of each approach based on prediction certainty. Our self-supervised framework uses the geometric pipeline as an expert teacher to automatically generate training data. Experiments demonstrate that our approach achieves an 88.0% success rate in controlled environments and 84.7% in real greenhouse conditions, significantly outperforming both purely geometric (75.3%) and neural (60.2%) methods. This work establishes a new paradigm for agricultural robotics where domain expertise is seamlessly integrated with machine learning capabilities, providing a foundation for fully automated crop monitoring systems.
comment: 15 pages, 9 figures
An introduction to using dual quaternions to study kinematics
We explain the use of dual quaternions to represent poses, twists, and wrenches.
comment: Added a natural explanation for how a unit dual quaternion acts upon a vector. arXiv admin note: substantial text overlap with arXiv:2202.09268
Multiagent Systems
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Artificial intelligence (AI) is reshaping strategic planning, with Multi-Agent Reinforcement Learning (MARL) enabling coordination among autonomous agents in complex scenarios. However, its practical deployment in sensitive military contexts is constrained by the lack of explainability, which is an essential factor for trust, safety, and alignment with human strategies. This work reviews and assesses current advances in explainability methods for MARL with a focus on simulated air combat scenarios. We proceed by adapting various explainability techniques to different aerial combat scenarios to gain explanatory insights about the model behavior. By linking AI-generated tactics with human-understandable reasoning, we emphasize the need for transparency to ensure reliable deployment and meaningful human-machine interaction. By illuminating the crucial importance of explainability in advancing MARL for operational defense, our work supports not only strategic planning but also the training of military personnel with insightful and comprehensible analyses.
comment: Published as a journal chapter in NATO Journal of Science and Technology
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking
Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks, including financial report summarization, earnings call transcript analysis, and asset classification. However, their real-world effectiveness in managing complex fund investment remains inadequately assessed. A fundamental limitation of existing benchmarks for evaluating LLM-driven trading strategies is their reliance on historical back-testing, inadvertently enabling LLMs to "time travel"-leveraging future information embedded in their training corpora, thus resulting in possible information leakage and overly optimistic performance estimates. To address this issue, we introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions. Utilizing a multi-agent architecture, DeepFund connects directly with real-time stock market data-specifically data published after each model pretraining cutoff-to ensure fair and leakage-free evaluations. Empirical tests on nine flagship LLMs from leading global institutions across multiple investment dimensions-including ticker-level analysis, investment decision-making, portfolio management, and risk control-reveal significant practical challenges. Notably, even cutting-edge models such as DeepSeek-V3 and Claude-3.7-Sonnet incur net trading losses within DeepFund real-time evaluation environment, underscoring the present limitations of LLMs for active fund management. Our code is available at https://github.com/HKUSTDial/DeepFund.
comment: 21 pages, 9 figures
Humans expect rationality and cooperation from LLM opponents in strategic games
As Large Language Models (LLMs) integrate into our social and economic interactions, we need to deepen our understanding of how humans respond to LLMs opponents in strategic settings. We present the results of the first controlled monetarily-incentivised laboratory experiment looking at differences in human behaviour in a multi-player p-beauty contest against other humans and LLMs. We use a within-subject design in order to compare behaviour at the individual level. We show that, in this environment, human subjects choose significantly lower numbers when playing against LLMs than humans, which is mainly driven by the increased prevalence of `zero' Nash-equilibrium choices. This shift is mainly driven by subjects with high strategic reasoning ability. Subjects who play the zero Nash-equilibrium choice motivate their strategy by appealing to perceived LLM's reasoning ability and, unexpectedly, propensity towards cooperation. Our findings provide foundational insights into the multi-player human-LLM interaction in simultaneous choice games, uncover heterogeneities in both subjects' behaviour and beliefs about LLM's play when playing against them, and suggest important implications for mechanism design in mixed human-LLM systems.
Vaiage: A Multi-Agent Solution to Personalized Travel Planning
Planning trips is a cognitively intensive task involving conflicting user preferences, dynamic external information, and multi-step temporal-spatial optimization. Traditional platforms often fall short - they provide static results, lack contextual adaptation, and fail to support real-time interaction or intent refinement. Our approach, Vaiage, addresses these challenges through a graph-structured multi-agent framework built around large language models (LLMs) that serve as both goal-conditioned recommenders and sequential planners. LLMs infer user intent, suggest personalized destinations and activities, and synthesize itineraries that align with contextual constraints such as budget, timing, group size, and weather. Through natural language interaction, structured tool use, and map-based feedback loops, Vaiage enables adaptive, explainable, and end-to-end travel planning grounded in both symbolic reasoning and conversational understanding. To evaluate Vaiage, we conducted human-in-the-loop experiments using rubric-based GPT-4 assessments and qualitative feedback. The full system achieved an average score of 8.5 out of 10, outperforming the no-strategy (7.2) and no-external-API (6.8) variants, particularly in feasibility. Qualitative analysis indicated that agent coordination - especially the Strategy and Information Agents - significantly improved itinerary quality by optimizing time use and integrating real-time context. These results demonstrate the effectiveness of combining LLM reasoning with symbolic agent coordination in open-ended, real-world planning tasks.
Geofenced Unmanned Aerial Robotic Defender for Deer Detection and Deterrence (GUARD) ICRA
Wildlife-induced crop damage, particularly from deer, threatens agricultural productivity. Traditional deterrence methods often fall short in scalability, responsiveness, and adaptability to diverse farmland environments. This paper presents an integrated unmanned aerial vehicle (UAV) system designed for autonomous wildlife deterrence, developed as part of the Farm Robotics Challenge. Our system combines a YOLO-based real-time computer vision module for deer detection, an energy-efficient coverage path planning algorithm for efficient field monitoring, and an autonomous charging station for continuous operation of the UAV. In collaboration with a local Minnesota farmer, the system is tailored to address practical constraints such as terrain, infrastructure limitations, and animal behavior. The solution is evaluated through a combination of simulation and field testing, demonstrating robust detection accuracy, efficient coverage, and extended operational time. The results highlight the feasibility and effectiveness of drone-based wildlife deterrence in precision agriculture, offering a scalable framework for future deployment and extension.
comment: Accepted to the Novel Approaches for Precision Agriculture and Forestry with Autonomous Robots IEEE ICRA Workshop - 2025
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks across applications such as robotics and traffic management. Despite their growing importance, safety in multi-agent systems remains largely underexplored, with most research focusing on single AI models rather than interacting agents. This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions. By leveraging reasoning abilities, each agent evaluates responses from others to detect illogical reasoning processes, which indicate poisoned agents. Experiments on LLM-based multi-agent systems, including ChatGPT series and Llama 3, demonstrate the effectiveness of the proposed method, achieving high accuracy in identifying poisoned agents while minimizing false positives on clean agents. We believe this work provides insights into multi-agent system safety and contributes to the development of robust, trustworthy AI interactions.
Toward Adaptive Categories: Dimensional Governance for Agentic AI
As AI systems evolve from static tools to dynamic agents, traditional categorical governance frameworks -- based on fixed risk tiers, levels of autonomy, or human oversight models -- are increasingly insufficient on their own. Systems built on foundation models, self-supervised learning, and multi-agent architectures increasingly blur the boundaries that categories were designed to police. In this Perspective, we make the case for dimensional governance: a framework that tracks how decision authority, process autonomy, and accountability (the 3As) distribute dynamically across human-AI relationships. A critical advantage of this approach is its ability to explicitly monitor system movement toward and across key governance thresholds, enabling preemptive adjustments before risks materialize. This dimensional approach provides the necessary foundation for more adaptive categorization, enabling thresholds and classifications that can evolve with emerging capabilities. While categories remain essential for decision-making, building them upon dimensional foundations allows for context-specific adaptability and stakeholder-responsive governance that static approaches cannot achieve. We outline key dimensions, critical trust thresholds, and practical examples illustrating where rigid categorical frameworks fail -- and where a dimensional mindset could offer a more resilient and future-proof path forward for both governance and innovation at the frontier of artificial intelligence.
comment: 12 pages core text, 14 pages including references, 2 figures
An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents
Extracting alphanumeric data from form-like documents such as invoices, purchase orders, bills, and financial documents is often performed via vision (OCR) and learning algorithms or monolithic pipelines with limited potential for systemic improvements. We propose an agentic AI system that leverages Large Language Model (LLM) agents and a reinforcement learning (RL) driver agent to automate consistent, self-improving extraction under LLM inference uncertainty. Our work highlights the limitations of monolithic LLM-based extraction and introduces a modular, multi-agent framework with task-specific prompts and an RL policy of rewards and penalties to guide a meta-prompting agent to learn from past errors and improve prompt-based actor agents. This self-corrective adaptive system handles diverse documents, file formats, layouts, and LLMs, aiming to automate accurate information extraction without the need for human intervention. Results as reported on two benchmark datasets of SOIRE, and CORD, are promising for the agentic AI framework.
AVA: Attentive VLM Agent for Mastering StarCraft II
We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience. Traditional frameworks such as SMAC rely on abstract state representations that diverge significantly from human perception, limiting the ecological validity of agent behavior. Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay. The AVA architecture consists of three integrated components: (1) a vision-language model enhanced with specialized self-attention mechanisms for strategic unit targeting and battlefield assessment, (2) a retrieval-augmented generation system that leverages domain-specific StarCraft II knowledge to inform tactical decisions, and (3) a dynamic role-based task distribution system that enables coordinated multi-agent behavior. The experimental evaluation in our proposed AVACraft environment, which contains 21 multimodal StarCraft II scenarios, demonstrates that AVA powered by foundation models (specifically Qwen-VL and GPT-4o) can execute complex tactical maneuvers without explicit training, achieving comparable performance to traditional MARL methods that require substantial training iterations. This work establishes a foundation for developing human-aligned StarCraft II agents and advances the broader research agenda of multimodal game AI. Our implementation is available at https://github.com/camel-ai/VLM-Play-StarCraft2.
comment: Under Review
Satellite Autonomous Clock Fault Monitoring with Inter-Satellite Ranges Using Euclidean Distance Matrices
To address the need for robust positioning, navigation, and timing services in lunar environments, this paper proposes a novel onboard clock phase jump detection framework for satellite constellations using range measurements obtained from dual one-way inter-satellite links. Our approach leverages vertex redundantly rigid graphs to detect faults without relying on prior knowledge of satellite positions or clock biases, providing flexibility for lunar satellite networks with diverse satellite types and operators. We model satellite constellations as graphs, where satellites are vertices and inter-satellite links are edges. The proposed algorithm detects and identifies satellites with clock jumps by monitoring the singular values of the geometric-centered Euclidean distance matrix (GCEDM) of 5-clique sub-graphs. The proposed method is validated through simulations of a GPS constellation and a notional constellation around the Moon, demonstrating its effectiveness in various configurations.
comment: This manuscript was submitted to the NAVIGATION: Journal of the Institute of Navigation
EconoJax: A Fast & Scalable Economic Simulation in Jax AAMAS 2025
Accurate economic simulations often require many experimental runs, particularly when combined with reinforcement learning. Unfortunately, training reinforcement learning agents in multi-agent economic environments can be slow. This paper introduces EconoJax, a fast simulated economy, based on the AI economist. EconoJax, and its training pipeline, are completely written in JAX. This allows EconoJax to scale to large population sizes and perform large experiments, while keeping training times within minutes. Through experiments with populations of 100 agents, we show how real-world economic behavior emerges through training within 15 minutes, in contrast to previous work that required several days. We additionally perform experiments in varying sized action spaces to test if some multi-agent methods produce more diverse behavior compared to others. Here, our findings indicate no notable differences in produced behavior with different methods as is sometimes suggested in earlier works. To aid further research, we open-source EconoJax on Github.
comment: 8 pages, updated and extended version, accepted at AAMAS 2025
Multicopy Reinforcement Learning Agents AAMAS
This paper examines a novel type of multi-agent problem, in which an agent makes multiple identical copies of itself in order to achieve a single agent task better or more efficiently. This strategy improves performance if the environment is noisy and the task is sometimes unachievable by a single agent copy. We propose a learning algorithm for this multicopy problem which takes advantage of the structure of the value function to efficiently learn how to balance the advantages and costs of adding additional copies.
comment: From ALA Workshop at AAMAS Conference May 2025
Systems and Control (CS)
REACT: Runtime-Enabled Active Collision-avoidance Technique for Autonomous Driving
Achieving rapid and effective active collision avoidance in dynamic interactive traffic remains a core challenge for autonomous driving. This paper proposes REACT (Runtime-Enabled Active Collision-avoidance Technique), a closed-loop framework that integrates risk assessment with active avoidance control. By leveraging energy transfer principles and human-vehicle-road interaction modeling, REACT dynamically quantifies runtime risk and constructs a continuous spatial risk field. The system incorporates physically grounded safety constraints such as directional risk and traffic rules to identify high-risk zones and generate feasible, interpretable avoidance behaviors. A hierarchical warning trigger strategy and lightweight system design enhance runtime efficiency while ensuring real-time responsiveness. Evaluations across four representative high-risk scenarios including car-following braking, cut-in, rear-approaching, and intersection conflict demonstrate REACT's capability to accurately identify critical risks and execute proactive avoidance. Its risk estimation aligns closely with human driver cognition (i.e., warning lead time < 0.4 s), achieving 100% safe avoidance with zero false alarms or missed detections. Furthermore, it exhibits superior real-time performance (< 50 ms latency), strong foresight, and generalization. The lightweight architecture achieves state-of-the-art accuracy, highlighting its potential for real-time deployment in safety-critical autonomous systems.
comment: 22 pages, 11 figures
IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting
Accurate electricity load forecasting is essential for grid stability, resource optimization, and renewable energy integration. While transformer-based deep learning models like TimeGPT have gained traction in time-series forecasting, their effectiveness in long-term electricity load prediction remains uncertain. This study evaluates forecasting models ranging from classical regression techniques to advanced deep learning architectures using data from the ESD 2025 competition. The dataset includes two years of historical electricity load data, alongside temperature and global horizontal irradiance (GHI) across five sites, with a one-day-ahead forecasting horizon. Since actual test set load values remain undisclosed, leveraging predicted values would accumulate errors, making this a long-term forecasting challenge. We employ (i) Principal Component Analysis (PCA) for dimensionality reduction and (ii) frame the task as a regression problem, using temperature and GHI as covariates to predict load for each hour, (iii) ultimately stacking 24 models to generate yearly forecasts. Our results reveal that deep learning models, including TimeGPT, fail to consistently outperform simpler statistical and machine learning approaches due to the limited availability of training data and exogenous variables. In contrast, XGBoost, with minimal feature engineering, delivers the lowest error rates across all test cases while maintaining computational efficiency. This highlights the limitations of deep learning in long-term electricity forecasting and reinforces the importance of model selection based on dataset characteristics rather than complexity. Our study provides insights into practical forecasting applications and contributes to the ongoing discussion on the trade-offs between traditional and modern forecasting methods.
Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients. However, they face significant challenges, including public acceptance, limited longevity, and barriers to commercialization. Meanwhile, noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust high-dimensional control for dexterous tasks. To address these issues, this study introduces a novel human-centered multimodal AI approach as intelligent compensatory mechanisms for lost motor functions that could potentially enable patients with severe paralysis to control high-dimensional assistive devices, such as dexterous robotic arms, using limited and noninvasive inputs. In contrast to the current state-of-the-art (SoTA) noninvasive approaches, our context-aware, multimodal shared-autonomy framework integrates deep reinforcement learning algorithms to blend limited low-dimensional user input with real-time environmental perception, enabling adaptive, dynamic, and intelligent interpretation of human intent for complex dexterous manipulation tasks, such as pick-and-place. The results from our ARAS (Adaptive Reinforcement learning for Amplification of limited inputs in Shared autonomy) trained with synthetic users over 50,000 computer simulation episodes demonstrated the first successful implementation of the proposed closed-loop human-in-the-loop paradigm, outperforming the SoTA shared autonomy algorithms. Following a zero-shot sim-to-real transfer, ARAS was evaluated on 23 human subjects, demonstrating high accuracy in dynamic intent detection and smooth, stable 3D trajectory control for dexterous pick-and-place tasks. ARAS user study achieved a high task success rate of 92.88%, with short completion times comparable to those of SoTA invasive assistive technologies.
Bilevel Transmission Expansion Planning with Joint Chance-Constrained Dispatch
In transmission expansion planning (TEP), network planners make long-term investment decisions while anticipating market clearing outcomes that are increasingly affected by renewable generation uncertainty. Additionally, market participants' sensitivity to network charges and the requirement for cost recovery by the network planner introduce further complexity. Since the day-ahead market clears before uncertainty realizes, explicitly modelling these uncertainties at the lower-level market clearing becomes important in bilevel TEP problems. In this paper, we introduce a novel bilevel TEP framework with lower-level joint chance-constrained market clearing that manages line flow constraints under wind uncertainty and accounts for the effect of network tariffs on participants' actual marginal costs and utility. To solve this complex problem, we propose a Strengthened Linear Approximation (SLA) technique for handling Wasserstein distributionally robust joint chance constraints with right-hand-side uncertainties (RHS-WDRJCC). The proposed method offers more efficient approximations without additional conservativeness and avoids the numerical issues encountered in existing approaches by introducing valid inequalities. The case study demonstrates that the proposed model achieves the desired out-of-sample constraint satisfaction probability. Moreover, the numerical results highlight the significant computational advantage of SLA, achieving up to a 26x speedup compared to existing methods such as worst-case conditional value-at-risk, while maintaining high solution quality.
Formal Uncertainty Propagation for Stochastic Dynamical Systems with Additive Noise
In this paper, we consider discrete-time non-linear stochastic dynamical systems with additive process noise in which both the initial state and noise distributions are uncertain. Our goal is to quantify how the uncertainty in these distributions is propagated by the system dynamics for possibly infinite time steps. In particular, we model the uncertainty over input and noise as ambiguity sets of probability distributions close in the $\rho$-Wasserstein distance and aim to quantify how these sets evolve over time. Our approach relies on results from quantization theory, optimal transport, and stochastic optimization to construct ambiguity sets of distributions centered at mixture of Gaussian distributions that are guaranteed to contain the true sets for both finite and infinite prediction time horizons. We empirically evaluate the effectiveness of our framework in various benchmarks from the control and machine learning literature, showing how our approach can efficiently and formally quantify the uncertainty in linear and non-linear stochastic dynamical systems.
Sliding Speed Influences Electrovibration-Induced Finger Friction Dynamics on Touchscreens
Electrovibration technology can render tactile textures on capacitive touchscreens by modulating friction between the finger and the screen through electrostatic attraction force generated by applying an alternating voltage signal to the screen. This signal should be carefully calibrated for realistic and robust texture rendering. However, this process is challenging due to variations in sliding speed, applied force, and individual skin mechanics, which affect friction in complex and unpredictable ways. Here, we investigate how exploration conditions affect electrovibration-induced finger friction on touchscreens and the role of skin mechanics in this process. Ten participants slid their index fingers across an electrovibration-enabled touchscreen at five sliding speeds ($20\sim100$ mm/s) and applied force levels ($0.2\sim0.6$ N) while we measured contact forces and skin accelerations. The touchscreen was excited with amplitude-modulated voltage signals across frequencies relevant to touch. We modeled the finger-touchscreen friction response as a first-order system and the skin mechanics as a mass-spring-damper system. Our results showed that the sliding speed influenced the cutoff frequency of the friction response as well as the moving mass and stiffness of the finger for the tested exploration ranges. Specifically, for every 1 mm/s increase in speed, the cutoff frequency, the finger moving mass, and stiffness increased by $13.8$ Hz, $3.23\times 10^{-5}$ kg, and $4.04$ N/m, respectively. Further correlation analysis revealed that finger stiffness affected the cutoff frequency more than the moving mass. Finally, we developed a practical model for electrovibration-induced finger friction on touchscreens that accounts for sliding speed variations, paving the way for delivering consistent haptic feedback through electrovibration.
comment: 19 pages, 13 figures, journal
Event disturbance rejection: a case study
This article introduces the problem of robust event disturbance rejection. Inspired by the design principle of linear output regulation, a control structure based on excitable systems is proposed. Unlike the linear case, contraction of the closed-loop system must be enforced through specific input signals. This induced contraction enables a steady-state analysis similar to the linear case. Thanks to the excitable nature of the systems, the focus shifts from precise trajectory tracking to the regulation of discrete events, such as spikes. The study emphasizes rejecting events rather than trajectories and demonstrates the robustness of the approach, even under mismatches between the controller and the exosystem. This work is a first step towards developing a design principle for event regulation.
comment: Accepted at NOLCOS (13th IFAC Symposium on Nonlinear Control Systems)
LLM-Enhanced Symbolic Control for Safety-Critical Applications
Motivated by Smart Manufacturing and Industry 4.0, we introduce a framework for synthesizing Abstraction-Based Controller Design (ABCD) for reach-avoid problems from Natural Language (NL) specifications using Large Language Models (LLMs). A Code Agent interprets an NL description of the control problem and translates it into a formal language interpretable by state-of-the-art symbolic control software, while a Checker Agent verifies the correctness of the generated code and enhances safety by identifying specification mismatches. Evaluations show that the system handles linguistic variability and improves robustness over direct planning with LLMs. The proposed approach lowers the barrier to formal control synthesis by enabling intuitive, NL-based task definition while maintaining safety guarantees through automated validation.
Beyond KL-divergence: Risk Aware Control Through Cross Entropy and Adversarial Entropy Regularization
While the idea of robust dynamic programming (DP) is compelling for systems affected by uncertainty, addressing worst-case disturbances generally results in excessive conservatism. This paper introduces a method for constructing control policies robust to adversarial disturbance distributions that relate to a provided empirical distribution. The character of the adversary is shaped by a regularization term comprising a weighted sum of (i) the cross-entropy between the empirical and the adversarial distributions, and (ii) the entropy of the adversarial distribution itself. The regularization weights are interpreted as the likelihood factor and the temperature respectively. The proposed framework leads to an efficient DP-like algorithm -- referred to as the minsoftmax algorithm -- to obtain the optimal control policy, where the disturbances follow an analytical softmax distribution in terms of the empirical distribution, temperature, and likelihood factor. It admits a number of control-theoretic interpretations and can thus be understood as a flexible tool for integrating complementary features of related control frameworks. In particular, in the linear model quadratic cost setting, with a Gaussian empirical distribution, we draw connections to the well-known $\mathcal{H}_{\infty}$-control. We illustrate our results through a numerical example.
Lifelong reinforcement learning for health-aware fast charging of lithium-ion batteries
Fast charging of lithium-ion batteries remains a critical bottleneck for widespread adoption of electric vehicles and stationary energy storage systems, as improperly designed fast charging can accelerate battery degradation and shorten lifespan. In this work, we address this challenge by proposing a health-aware fast charging strategy that explicitly balances charging speed and battery longevity across the entire service life. The key innovation lies in establishing a mapping between anode overpotential and the state of health (SoH) of battery, which is then used to constrain the terminal charging voltage in a twin delayed deep deterministic policy gradient (TD3) framework. By incorporating this SoH-dependent voltage constraint, our designed deep learning method mitigates side reactions and effectively extends battery life. To validate the proposed approach, a high-fidelity single particle model with electrolyte is implemented in the widely adopted PyBaMM simulation platform, capturing degradation phenomena at realistic scales. Comparative life-cycle simulations against conventional CC-CV, its variants, and constant current-constant overpotential methods show that the TD3-based controller reduces overall degradation while maintaining competitively fast charge times. These results demonstrate the practical viability of deep reinforcement learning for advanced battery management systems and pave the way for future explorations of health-aware, performance-optimized charging strategies.
User-centric Vehicle-to-Grid Optimization with an Input Convex Neural Network-based Battery Degradation Model
We propose a data-driven, user-centric vehicle-to-grid (V2G) methodology based on multi-objective optimization to balance battery degradation and V2G revenue according to EV user preference. Given the lack of accurate and generalizable battery degradation models, we leverage input convex neural networks (ICNNs) to develop a data-driven degradation model trained on extensive experimental datasets. This approach enables our model to capture nonconvex dependencies on battery temperature and time while maintaining convexity with respect to the charging rate. Such a partial convexity property ensures that the second stage of our methodology remains computationally efficient. In the second stage, we integrate our data-driven degradation model into a multi-objective optimization framework to generate an optimal smart charging profile for each EV. This profile effectively balances the trade-off between financial benefits from V2G participation and battery degradation, controlled by a hyperparameter reflecting the user prioritization of battery health. Numerical simulations show the high accuracy of the ICNN model in predicting battery degradation for unseen data. Finally, we present a trade-off curve illustrating financial benefits from V2G versus losses from battery health degradation based on user preferences and showcase smart charging strategies under realistic scenarios.
A User-centric Game for Balancing V2G Benefits with Battery Degradation of Electric Vehicles
We present a novel user centric vehicle to grid framework that enables electric vehicle users to balance the trade off between financial benefits from VtoG and battery health degradation based on individual preference signals.
DRL-Based Injection Molding Process Parameter Optimization for Adaptive and Profitable Production
Plastic injection molding remains essential to modern manufacturing. However, optimizing process parameters to balance product quality and profitability under dynamic environmental and economic conditions remains a persistent challenge. This study presents a novel deep reinforcement learning (DRL)-based framework for real-time process optimization in injection molding, integrating product quality and profitability into the control objective. A profit function was developed to reflect real-world manufacturing costs, incorporating resin, mold wear, and electricity prices, including time-of-use variations. Surrogate models were constructed to predict product quality and cycle time, enabling efficient offline training of DRL agents using soft actor-critic (SAC) and proximal policy optimization (PPO) algorithms. Experimental results demonstrate that the proposed DRL framework can dynamically adapt to seasonal and operational variations, consistently maintaining product quality while maximizing profit. Compared to traditional optimization methods such as genetic algorithms, the DRL models achieved comparable economic performance with up to 135x faster inference speeds, making them well-suited for real-time applications. The framework's scalability and adaptability highlight its potential as a foundation for intelligent, data-driven decision-making in modern manufacturing environments.
comment: 50 pages, 10 figures
A Scalable Procedure for $\mathcal{H}_{\infty}-$Control Design
This paper proposes a novel gradient based scalable procedure for $\mathcal{H}_{\infty}-$control design. We compute the gradient using algebraic Riccati equation and then couple it with a novel Armijo rule inspired step-size selection procedure. We perform numerical experiments of the proposed solution procedure on an exhaustive list of benchmark engineering systems to show its convergence properties. Finally we compare our proposed solution procedure with available semi-definite programming based gradient-descent algorithm to demonstrate its scalability.
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
Robust 2D lidar-based SLAM in arboreal environments without IMU/GNSS
Simultaneous localization and mapping (SLAM) approaches for mobile robots remains challenging in forest or arboreal fruit farming environments, where tree canopies obstruct Global Navigation Satellite Systems (GNSS) signals. Unlike indoor settings, these agricultural environments possess additional challenges due to outdoor variables such as foliage motion and illumination variability. This paper proposes a solution based on 2D lidar measurements, which requires less processing and storage, and is more cost-effective, than approaches that employ 3D lidars. Utilizing the modified Hausdorff distance (MHD) metric, the method can solve the scan matching robustly and with high accuracy without needing sophisticated feature extraction. The method's robustness was validated using public datasets and considering various metrics, facilitating meaningful comparisons for future research. Comparative evaluations against state-of-the-art algorithms, particularly A-LOAM, show that the proposed approach achieves lower positional and angular errors while maintaining higher accuracy and resilience in GNSS-denied settings. This work contributes to the advancement of precision agriculture by enabling reliable and autonomous navigation in challenging outdoor environments.
Comparative Analysis of Black-Box Optimization Methods for Weather Intervention Design
As climate change increases the threat of weather-related disasters, research on weather control is gaining importance. The objective of weather control is to mitigate disaster risks by administering interventions with optimal timing, location, and intensity. However, the optimization process is highly challenging due to the vast scale and complexity of weather phenomena, which introduces two major challenges. First, obtaining accurate gradient information for optimization is difficult. In addition, numerical weather prediction (NWP) models demand enormous computational resources, necessitating parameter optimization with minimal function evaluations. To address these challenges, this study proposes a method for designing weather interventions based on black-box optimization, which enables efficient exploration without requiring gradient information. The proposed method is evaluated in two distinct control scenarios: one-shot initial value intervention and sequential intervention based on model predictive control. Furthermore, a comparative analysis is conducted among four representative black-box optimization methods in terms of total rainfall reduction. Experimental results show that Bayesian optimization achieves higher control effectiveness than the others, particularly in high-dimensional search spaces. These findings suggest that Bayesian optimization is a highly effective approach for weather intervention computation.
comment: 15 pages, 11 figures
Enhancing Secrecy Energy Efficiency in RIS-Aided Aerial Mobile Edge Computing Networks: A Deep Reinforcement Learning Approach
This paper studies the problem of securing task offloading transmissions from ground users against ground eavesdropping threats. Our study introduces a reconfigurable intelligent surface (RIS)-aided unmanned aerial vehicle (UAV)-mobile edge computing (MEC) scheme to enhance the secure task offloading while minimizing the energy consumption of the UAV subject to task completion constraints. Leveraging a data-driven approach, we propose a comprehensive optimization strategy that jointly optimizes the aerial MEC (AMEC)'s trajectory, task offloading partitioning, UE transmission scheduling, and RIS phase shifts. Our objective centers on optimizing the secrecy energy efficiency (SEE) of UE task offloading transmissions while preserving the AMEC's energy resources and meeting the task completion time requirements. Numerical results show that the proposed solution can effectively safeguard legitimate task offloading transmissions while preserving AMEC energy.
comment: This article has been accepted for publication in the IEEE 2025 International Conference on Communications (ICC2025)
RAN Tester UE: An Automated Declarative UE Centric Security Testing Platform
Cellular networks require strict security procedures and measures across various network components, from core to radio access network (RAN) and end-user devices. As networks become increasingly complex and interconnected, as in O-RAN deployments, they are exposed to a numerous security threats. Therefore, ensuring robust security is critical for O-RAN to protect network integrity and safeguard user data. This requires rigorous testing methodologies to mitigate threats. This paper introduces an automated, adaptive, and scalable user equipment (UE) based RAN security testing framework designed to address the shortcomings of existing RAN testing solutions. Experimental results on a 5G software radio testbed built with commercial off-the-shelf hardware and open source software validate the efficiency and reproducibility of sample security test procedures developed on the RAN Tester UE framework.
comment: This article has been accepted for publication in the ACM Symposium on Access Control Models and Technologies (SACMAT'25)
Optimal $\mathbb{H}_2$ Control with Passivity-Constrained Feedback: Convex Approach
We consider the $\mathbb{H}_2$-optimal feedback control problem, for the case in which the plant is passive with bounded $\mathbb{L}_2$ gain, and the feedback law is constrained to be output-strictly passive. In this circumstance, we show that this problem distills to a convex optimal control problem, in which the optimization domain is the associated Youla parameter for the closed-loop system. This enables the globally-optimal controller to be solved as an infinite-dimensional but convex optimization. Near-optimal solutions may be found through the finite-dimensional convex truncation of this infinite-dimensional domain. The idea is demonstrated on a simple vibration suppression example.
On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks
This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the $l_1$-norm estimator overcomes the challenges as long as the probability that the system is under adversarial attack at a given time is smaller than a certain threshold. We provide an estimation error bound that decays with the input memory length and prove its optimality by constructing a problem instance that suffers from the same bound under adversarial attacks. Our work provides a sharp input-output analysis for a generic nonlinear and partially observed system under significantly generalized assumptions compared to existing works.
comment: 28 pages, 2 figures
Data-based control of Logical Networks
In recent years, data-driven approaches have become increasingly pervasive across all areas of control engineering. However, the applications of data-based techniques to Boolean Control Networks (BCNs) are still very limited. In this paper we aim to fill this gap, by exploring the possibility of solving three fundamental control problems, i.e., state feedback stabilization, safe control and output regulation, for a BCN, leveraging only a limited amount of data generated by the network, without knowing or identifying its model.
Dynamic Beam-Stabilized, Additive-Printed Flexible Antenna Arrays with On-Chip Rapid Insight Generation
Conformal phased arrays promise shape-changing properties, multiple degrees of freedom to the scan angle, and novel applications in wearables, aerospace, defense, vehicles, and ships. However, they have suffered from two critical limitations. (1) Although most applications require on-the-move communication and sensing, prior conformal arrays have suffered from dynamic deformation-induced beam pointing errors. We introduce a Dynamic Beam-Stabilized (DBS) processor capable of beam adaptation through on-chip real-time control of fundamental gain, phase, and delay for each element. (2) Prior conformal arrays have leveraged additive printing to enhance flexibility, but conventional printable inks based on silver are expensive, and those based on copper suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. We instead leverage a low-cost Copper Molecular Decomposition (CuMOD) ink with < 0.1% variation per degree C with temperature and strain and correct any residual deformity in real-time using the DBS processor. Demonstrating unified material and physical deformation correction, our CMOS DBS processor is low power, low-area, and easily scalable due to a tile architecture, thereby ideal for on-device implementations.
comment: Uploaded by mistake, this is second version of the previous upload real-time deformation correction paper (arXiv:2406.07797). I will upload this paper as a second version of that paper
Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses
State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to static decoder boundaries. Goal: We introduce the Reviewer, a 3D visual interface projecting EMG signals directly into the decoder's classification space, providing intuitive, real-time insight into PR algorithm behavior. This structured feedback reduces cognitive load and fosters mutual, data-driven adaptation between user-generated EMG patterns and decoder boundaries. Methods: A 10-session study with 12 able-bodied participants compared PR performance after motor-based training and updating using the Reviewer versus conventional virtual arm visualization. Performance was assessed using a Fitts law task that involved the aperture of the cursor and the control of orientation. Results: Participants trained with the Reviewer achieved higher completion rates, reduced overshoot, and improved path efficiency and throughput compared to the standard visualization group. Significance: The Reviewer introduces decoder-informed motor training, facilitating immediate and consistent PR-based myoelectric control improvements. By iteratively refining control through real-time feedback, this approach reduces reliance on trial-and-error recalibration, enabling a more adaptive, self-correcting training framework. Conclusion: The 3D visual feedback significantly improves PR control in novice operators through structured training, enabling feedback-driven adaptation and reducing reliance on extensive heuristic adjustments.
Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.
comment: 12 pages, 15 figures
Damping Identification Sensitivity in Flutter Speed Estimation
Predicting flutter remains a key challenge in aeroelastic research, with certain models relying on modal parameters, such as natural frequencies and damping ratios. These models are particularly useful in early design stages or for the development of small Unmanned Aerial Vehicles (maximum take-off mass below 7 kg). This study evaluates two frequency-domain system identification methods, Fast Relaxed Vector Fitting (FRVF) and the Loewner Framework (LF), for predicting the flutter onset speed of a flexible wing model. Both methods are applied to extract modal parameters from Ground Vibration Testing data, which are subsequently used to develop a reduced-order model with two degrees of freedom. Results indicate that FRVF and LF-informed models provide reliable flutter speed, with predictions deviating by no more than 3% (FRVF) and 5% (LF) from the N4SID-informed benchmark. The findings highlight the sensitivity of flutter speed predictions to damping ratio identification accuracy and demonstrate the potential of these methods as computationally efficient alternatives for preliminary aeroelastic assessments.
Satellite Autonomous Clock Fault Monitoring with Inter-Satellite Ranges Using Euclidean Distance Matrices
To address the need for robust positioning, navigation, and timing services in lunar environments, this paper proposes a novel onboard clock phase jump detection framework for satellite constellations using range measurements obtained from dual one-way inter-satellite links. Our approach leverages vertex redundantly rigid graphs to detect faults without relying on prior knowledge of satellite positions or clock biases, providing flexibility for lunar satellite networks with diverse satellite types and operators. We model satellite constellations as graphs, where satellites are vertices and inter-satellite links are edges. The proposed algorithm detects and identifies satellites with clock jumps by monitoring the singular values of the geometric-centered Euclidean distance matrix (GCEDM) of 5-clique sub-graphs. The proposed method is validated through simulations of a GPS constellation and a notional constellation around the Moon, demonstrating its effectiveness in various configurations.
comment: This manuscript was submitted to the NAVIGATION: Journal of the Institute of Navigation
Demonstrating a Control Framework for Physical Human-Robot Interaction Toward Industrial Applications
Physical Human-Robot Interaction (pHRI) is critical for implementing Industry 5.0, which focuses on human-centric approaches. However, few studies explore the practical alignment of pHRI to industrial-grade performance. This paper introduces a versatile control framework designed to bridge this gap by incorporating the torque-based control modes: compliance control, null-space compliance, and dual compliance, all in static and dynamic scenarios. Thanks to our second-order Quadratic Programming (QP) formulation, strict kinematic and collision constraints are integrated into the system as safety features, and a weighted hierarchy guarantees singularity-robust task tracking performance. The framework is implemented on a Kinova Gen3 collaborative robot (cobot) equipped with a Bota force/torque sensor. A DualShock 4 game controller is attached to the robot's end-effector to demonstrate the framework's capabilities. This setup enables seamless dynamic switching between the modes, and real-time adjustments of parameters, such as transitioning between position and torque control or selecting a more robust custom-developed low-level torque controller over the default one. Built on the open-source robotic control software mc_rtc, our framework ensures reproducibility for both research and industrial deployment, this framework demonstrates a step toward industrial-grade performance and repeatability, showcasing its potential as a robust pHRI control system for industrial environments.
comment: Demo Paper submitted to Robotics: Science and Systems (RSS2025), accepted
A finite-sample bound for identifying partially observed linear switched systems from a single trajectory
We derive a finite-sample probabilistic bound on the parameter estimation error of a system identification algorithm for Linear Switched Systems. The algorithm estimates Markov parameters from a single trajectory and applies a variant of the Ho-Kalman algorithm to recover the system matrices. Our bound guarantees statistical consistency under the assumption that the true system exhibits quadratic stability. The proof leverages the theory of weakly dependent processes. To the best of our knowledge, this is the first finite-sample bound for this algorithm in the single-trajectory setting.
Large Vision Model-Enhanced Digital Twin with Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks
Optimization of user association in a densely deployed cellular network is usually challenging and even more complicated due to the dynamic nature of user mobility and fluctuation in user counts. While deep reinforcement learning (DRL) emerges as a promising solution, its application in practice is hindered by high trial-and-error costs in real world and unsatisfactory physical network performance during training. Also, existing DRL-based user association methods are typically applicable to scenarios with a fixed number of users due to convergence and compatibility challenges. To address these limitations, we introduce a large vision model (LVM)-enhanced digital twin (DT) for wireless networks and propose a parallel DT-driven DRL method for user association and load balancing in networks with dynamic user counts, distribution, and mobility patterns. To construct this LVM-enhanced DT for DRL training, we develop a zero-shot generative user mobility model, named Map2Traj, based on the diffusion model. Map2Traj estimates user trajectory patterns and spatial distributions solely from street maps. DRL models undergo training in the DT environment, avoiding direct interactions with physical networks. To enhance the generalization ability of DRL models for dynamic scenarios, a parallel DT framework is further established to alleviate strong correlation and non-stationarity in single-environment training and improve training efficiency. Numerical results show that the developed LVM-enhanced DT achieves closely comparable training efficacy to the real environment, and the proposed parallel DT framework even outperforms the single real-world environment in DRL training with nearly 20\% gain in terms of cell-edge user performance.
comment: arXiv admin note: text overlap with arXiv:2407.19765. This work has been submitted to the IEEE for possible publication
Non-Blocking Robustness Analysis in Discrete Event Systems
This paper presents a mathematical framework for characterizing state blocking in discrete event systems (DES) under transition deletions. We introduce a path-based analysis approach that determines whether systems maintain non-blocking properties when transitions are removed. Through formal analysis and case studies, we establish three key contributions: a mathematical characterization of transition-induced blocking with necessary and sufficient conditions, a definition of robust deviations that preserve non-blocking properties, and an algorithm for identifying critical transitions and analyzing system behavior under deletions. Our algorithm reduces computational complexity by leveraging minimal blocking sets, achieving significant reduction in computational requirements. We demonstrate the framework's effectiveness through manufacturing system and autonomous vehicle case studies, showing substantial improvements in identifying critical transitions and predicting potential blocking scenarios across different application domains.
comment: 9 pages, 8 figures, conference paper submitted to IEEE
Integrating Koopman theory and Lyapunov stability for enhanced model predictive control in nonlinear systems
This paper delves into the challenges posed by the increasing complexity of modern control systems, specifically focusing on bilinear systems, a prevalent subclass of non-linear systems characterized by state dynamics influenced by the interaction of state and control variables. Traditional control strategies, such as PID controllers, often fall short in adequately addressing the intricacies of such systems due to their predictive limitations. To bridge this gap, we introduce Model Predictive Control (MPC), a sophisticated technique that utilizes system models to forecast future behaviors, allowing for the computation of an optimal control sequence by minimizing deviations and control efforts. The Koopman operator emerges as a pivotal tool in this framework by providing a means to linearize the nonlinear dynamics of bilinear systems. By integrating the principles of Lyapunov theory with the linearizing capabilities of the Koopman operator into the MPC framework, we give rise to Koopman Lyapunov-based Model Predictive Control (Koopman LMPC). This approach not only retains MPC's predictive capabilities but also harnesses the Koopman operator's ability to transform complex nonlinear behaviors into a linear framework, thereby enhancing the robustness and applicability of LMPC. With the stability assurances from Lyapunov theory, Koopman LMPC presents a robust solution to effectively control and stabilize bilinear systems. The paper underscores the efficacy of Koopman LMPC, emphasizing its significance in achieving optimal performance and system stability, marking it as a promising approach for the future of advanced control systems.
comment: 19 pages, 6 figures, 2 author photos. Affiliations: Electrical Engineering Department, Pennsylvania State University. Keywords: Lyapunov Stability, Model Predictive Control (MPC), Control Optimization, Koopman Operator, Bilinear Systems, Nonlinear Systems, Closed-loop Design, Control Constraints
Economic Capacity Withholding Bounds of Competitive Energy Storage Bidders
Economic withholding in electricity markets refers to generators bidding higher than their true marginal fuel cost, and is a typical approach to exercising market power. However, existing market designs require storage to design bids strategically based on their own future price predictions, motivating storage to conduct economic withholding without assuming market power. As energy storage takes up more significant roles in wholesale electricity markets, understanding its motivations for economic withholding and the consequent effects on social welfare becomes increasingly vital. This paper derives a theoretical framework to study the economic capacity withholding behavior of storage participating in competitive electricity markets and validate our results in simulations based on the ISO New England system. We demonstrate that storage bids can reach unbounded high levels under conditions where future price predictions show bounded expectations but unbounded deviations. Conversely, in scenarios with peak price limitations, we show the upper bounds of storage bids are grounded in bounded price expectations. Most importantly, we show that storage capacity withholding can potentially lower the overall system cost when price models account for system uncertainties. Our paper reveals energy storage is not a market manipulator but an honest player contributing to the social welfare. It helps electricity market researchers and operators better understand the economic withholding behavior of storage and reform market policies to maximize storage contributing to a cost-efficient decolonization.
Systems and Control (EESS)
REACT: Runtime-Enabled Active Collision-avoidance Technique for Autonomous Driving
Achieving rapid and effective active collision avoidance in dynamic interactive traffic remains a core challenge for autonomous driving. This paper proposes REACT (Runtime-Enabled Active Collision-avoidance Technique), a closed-loop framework that integrates risk assessment with active avoidance control. By leveraging energy transfer principles and human-vehicle-road interaction modeling, REACT dynamically quantifies runtime risk and constructs a continuous spatial risk field. The system incorporates physically grounded safety constraints such as directional risk and traffic rules to identify high-risk zones and generate feasible, interpretable avoidance behaviors. A hierarchical warning trigger strategy and lightweight system design enhance runtime efficiency while ensuring real-time responsiveness. Evaluations across four representative high-risk scenarios including car-following braking, cut-in, rear-approaching, and intersection conflict demonstrate REACT's capability to accurately identify critical risks and execute proactive avoidance. Its risk estimation aligns closely with human driver cognition (i.e., warning lead time < 0.4 s), achieving 100% safe avoidance with zero false alarms or missed detections. Furthermore, it exhibits superior real-time performance (< 50 ms latency), strong foresight, and generalization. The lightweight architecture achieves state-of-the-art accuracy, highlighting its potential for real-time deployment in safety-critical autonomous systems.
comment: 22 pages, 11 figures
IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting
Accurate electricity load forecasting is essential for grid stability, resource optimization, and renewable energy integration. While transformer-based deep learning models like TimeGPT have gained traction in time-series forecasting, their effectiveness in long-term electricity load prediction remains uncertain. This study evaluates forecasting models ranging from classical regression techniques to advanced deep learning architectures using data from the ESD 2025 competition. The dataset includes two years of historical electricity load data, alongside temperature and global horizontal irradiance (GHI) across five sites, with a one-day-ahead forecasting horizon. Since actual test set load values remain undisclosed, leveraging predicted values would accumulate errors, making this a long-term forecasting challenge. We employ (i) Principal Component Analysis (PCA) for dimensionality reduction and (ii) frame the task as a regression problem, using temperature and GHI as covariates to predict load for each hour, (iii) ultimately stacking 24 models to generate yearly forecasts. Our results reveal that deep learning models, including TimeGPT, fail to consistently outperform simpler statistical and machine learning approaches due to the limited availability of training data and exogenous variables. In contrast, XGBoost, with minimal feature engineering, delivers the lowest error rates across all test cases while maintaining computational efficiency. This highlights the limitations of deep learning in long-term electricity forecasting and reinforces the importance of model selection based on dataset characteristics rather than complexity. Our study provides insights into practical forecasting applications and contributes to the ongoing discussion on the trade-offs between traditional and modern forecasting methods.
Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients. However, they face significant challenges, including public acceptance, limited longevity, and barriers to commercialization. Meanwhile, noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust high-dimensional control for dexterous tasks. To address these issues, this study introduces a novel human-centered multimodal AI approach as intelligent compensatory mechanisms for lost motor functions that could potentially enable patients with severe paralysis to control high-dimensional assistive devices, such as dexterous robotic arms, using limited and noninvasive inputs. In contrast to the current state-of-the-art (SoTA) noninvasive approaches, our context-aware, multimodal shared-autonomy framework integrates deep reinforcement learning algorithms to blend limited low-dimensional user input with real-time environmental perception, enabling adaptive, dynamic, and intelligent interpretation of human intent for complex dexterous manipulation tasks, such as pick-and-place. The results from our ARAS (Adaptive Reinforcement learning for Amplification of limited inputs in Shared autonomy) trained with synthetic users over 50,000 computer simulation episodes demonstrated the first successful implementation of the proposed closed-loop human-in-the-loop paradigm, outperforming the SoTA shared autonomy algorithms. Following a zero-shot sim-to-real transfer, ARAS was evaluated on 23 human subjects, demonstrating high accuracy in dynamic intent detection and smooth, stable 3D trajectory control for dexterous pick-and-place tasks. ARAS user study achieved a high task success rate of 92.88%, with short completion times comparable to those of SoTA invasive assistive technologies.
Bilevel Transmission Expansion Planning with Joint Chance-Constrained Dispatch
In transmission expansion planning (TEP), network planners make long-term investment decisions while anticipating market clearing outcomes that are increasingly affected by renewable generation uncertainty. Additionally, market participants' sensitivity to network charges and the requirement for cost recovery by the network planner introduce further complexity. Since the day-ahead market clears before uncertainty realizes, explicitly modelling these uncertainties at the lower-level market clearing becomes important in bilevel TEP problems. In this paper, we introduce a novel bilevel TEP framework with lower-level joint chance-constrained market clearing that manages line flow constraints under wind uncertainty and accounts for the effect of network tariffs on participants' actual marginal costs and utility. To solve this complex problem, we propose a Strengthened Linear Approximation (SLA) technique for handling Wasserstein distributionally robust joint chance constraints with right-hand-side uncertainties (RHS-WDRJCC). The proposed method offers more efficient approximations without additional conservativeness and avoids the numerical issues encountered in existing approaches by introducing valid inequalities. The case study demonstrates that the proposed model achieves the desired out-of-sample constraint satisfaction probability. Moreover, the numerical results highlight the significant computational advantage of SLA, achieving up to a 26x speedup compared to existing methods such as worst-case conditional value-at-risk, while maintaining high solution quality.
Formal Uncertainty Propagation for Stochastic Dynamical Systems with Additive Noise
In this paper, we consider discrete-time non-linear stochastic dynamical systems with additive process noise in which both the initial state and noise distributions are uncertain. Our goal is to quantify how the uncertainty in these distributions is propagated by the system dynamics for possibly infinite time steps. In particular, we model the uncertainty over input and noise as ambiguity sets of probability distributions close in the $\rho$-Wasserstein distance and aim to quantify how these sets evolve over time. Our approach relies on results from quantization theory, optimal transport, and stochastic optimization to construct ambiguity sets of distributions centered at mixture of Gaussian distributions that are guaranteed to contain the true sets for both finite and infinite prediction time horizons. We empirically evaluate the effectiveness of our framework in various benchmarks from the control and machine learning literature, showing how our approach can efficiently and formally quantify the uncertainty in linear and non-linear stochastic dynamical systems.
Sliding Speed Influences Electrovibration-Induced Finger Friction Dynamics on Touchscreens
Electrovibration technology can render tactile textures on capacitive touchscreens by modulating friction between the finger and the screen through electrostatic attraction force generated by applying an alternating voltage signal to the screen. This signal should be carefully calibrated for realistic and robust texture rendering. However, this process is challenging due to variations in sliding speed, applied force, and individual skin mechanics, which affect friction in complex and unpredictable ways. Here, we investigate how exploration conditions affect electrovibration-induced finger friction on touchscreens and the role of skin mechanics in this process. Ten participants slid their index fingers across an electrovibration-enabled touchscreen at five sliding speeds ($20\sim100$ mm/s) and applied force levels ($0.2\sim0.6$ N) while we measured contact forces and skin accelerations. The touchscreen was excited with amplitude-modulated voltage signals across frequencies relevant to touch. We modeled the finger-touchscreen friction response as a first-order system and the skin mechanics as a mass-spring-damper system. Our results showed that the sliding speed influenced the cutoff frequency of the friction response as well as the moving mass and stiffness of the finger for the tested exploration ranges. Specifically, for every 1 mm/s increase in speed, the cutoff frequency, the finger moving mass, and stiffness increased by $13.8$ Hz, $3.23\times 10^{-5}$ kg, and $4.04$ N/m, respectively. Further correlation analysis revealed that finger stiffness affected the cutoff frequency more than the moving mass. Finally, we developed a practical model for electrovibration-induced finger friction on touchscreens that accounts for sliding speed variations, paving the way for delivering consistent haptic feedback through electrovibration.
comment: 19 pages, 13 figures, journal
Event disturbance rejection: a case study
This article introduces the problem of robust event disturbance rejection. Inspired by the design principle of linear output regulation, a control structure based on excitable systems is proposed. Unlike the linear case, contraction of the closed-loop system must be enforced through specific input signals. This induced contraction enables a steady-state analysis similar to the linear case. Thanks to the excitable nature of the systems, the focus shifts from precise trajectory tracking to the regulation of discrete events, such as spikes. The study emphasizes rejecting events rather than trajectories and demonstrates the robustness of the approach, even under mismatches between the controller and the exosystem. This work is a first step towards developing a design principle for event regulation.
comment: Accepted at NOLCOS (13th IFAC Symposium on Nonlinear Control Systems)
LLM-Enhanced Symbolic Control for Safety-Critical Applications
Motivated by Smart Manufacturing and Industry 4.0, we introduce a framework for synthesizing Abstraction-Based Controller Design (ABCD) for reach-avoid problems from Natural Language (NL) specifications using Large Language Models (LLMs). A Code Agent interprets an NL description of the control problem and translates it into a formal language interpretable by state-of-the-art symbolic control software, while a Checker Agent verifies the correctness of the generated code and enhances safety by identifying specification mismatches. Evaluations show that the system handles linguistic variability and improves robustness over direct planning with LLMs. The proposed approach lowers the barrier to formal control synthesis by enabling intuitive, NL-based task definition while maintaining safety guarantees through automated validation.
Beyond KL-divergence: Risk Aware Control Through Cross Entropy and Adversarial Entropy Regularization
While the idea of robust dynamic programming (DP) is compelling for systems affected by uncertainty, addressing worst-case disturbances generally results in excessive conservatism. This paper introduces a method for constructing control policies robust to adversarial disturbance distributions that relate to a provided empirical distribution. The character of the adversary is shaped by a regularization term comprising a weighted sum of (i) the cross-entropy between the empirical and the adversarial distributions, and (ii) the entropy of the adversarial distribution itself. The regularization weights are interpreted as the likelihood factor and the temperature respectively. The proposed framework leads to an efficient DP-like algorithm -- referred to as the minsoftmax algorithm -- to obtain the optimal control policy, where the disturbances follow an analytical softmax distribution in terms of the empirical distribution, temperature, and likelihood factor. It admits a number of control-theoretic interpretations and can thus be understood as a flexible tool for integrating complementary features of related control frameworks. In particular, in the linear model quadratic cost setting, with a Gaussian empirical distribution, we draw connections to the well-known $\mathcal{H}_{\infty}$-control. We illustrate our results through a numerical example.
Lifelong reinforcement learning for health-aware fast charging of lithium-ion batteries
Fast charging of lithium-ion batteries remains a critical bottleneck for widespread adoption of electric vehicles and stationary energy storage systems, as improperly designed fast charging can accelerate battery degradation and shorten lifespan. In this work, we address this challenge by proposing a health-aware fast charging strategy that explicitly balances charging speed and battery longevity across the entire service life. The key innovation lies in establishing a mapping between anode overpotential and the state of health (SoH) of battery, which is then used to constrain the terminal charging voltage in a twin delayed deep deterministic policy gradient (TD3) framework. By incorporating this SoH-dependent voltage constraint, our designed deep learning method mitigates side reactions and effectively extends battery life. To validate the proposed approach, a high-fidelity single particle model with electrolyte is implemented in the widely adopted PyBaMM simulation platform, capturing degradation phenomena at realistic scales. Comparative life-cycle simulations against conventional CC-CV, its variants, and constant current-constant overpotential methods show that the TD3-based controller reduces overall degradation while maintaining competitively fast charge times. These results demonstrate the practical viability of deep reinforcement learning for advanced battery management systems and pave the way for future explorations of health-aware, performance-optimized charging strategies.
User-centric Vehicle-to-Grid Optimization with an Input Convex Neural Network-based Battery Degradation Model
We propose a data-driven, user-centric vehicle-to-grid (V2G) methodology based on multi-objective optimization to balance battery degradation and V2G revenue according to EV user preference. Given the lack of accurate and generalizable battery degradation models, we leverage input convex neural networks (ICNNs) to develop a data-driven degradation model trained on extensive experimental datasets. This approach enables our model to capture nonconvex dependencies on battery temperature and time while maintaining convexity with respect to the charging rate. Such a partial convexity property ensures that the second stage of our methodology remains computationally efficient. In the second stage, we integrate our data-driven degradation model into a multi-objective optimization framework to generate an optimal smart charging profile for each EV. This profile effectively balances the trade-off between financial benefits from V2G participation and battery degradation, controlled by a hyperparameter reflecting the user prioritization of battery health. Numerical simulations show the high accuracy of the ICNN model in predicting battery degradation for unseen data. Finally, we present a trade-off curve illustrating financial benefits from V2G versus losses from battery health degradation based on user preferences and showcase smart charging strategies under realistic scenarios.
A User-centric Game for Balancing V2G Benefits with Battery Degradation of Electric Vehicles
We present a novel user centric vehicle to grid framework that enables electric vehicle users to balance the trade off between financial benefits from VtoG and battery health degradation based on individual preference signals.
DRL-Based Injection Molding Process Parameter Optimization for Adaptive and Profitable Production
Plastic injection molding remains essential to modern manufacturing. However, optimizing process parameters to balance product quality and profitability under dynamic environmental and economic conditions remains a persistent challenge. This study presents a novel deep reinforcement learning (DRL)-based framework for real-time process optimization in injection molding, integrating product quality and profitability into the control objective. A profit function was developed to reflect real-world manufacturing costs, incorporating resin, mold wear, and electricity prices, including time-of-use variations. Surrogate models were constructed to predict product quality and cycle time, enabling efficient offline training of DRL agents using soft actor-critic (SAC) and proximal policy optimization (PPO) algorithms. Experimental results demonstrate that the proposed DRL framework can dynamically adapt to seasonal and operational variations, consistently maintaining product quality while maximizing profit. Compared to traditional optimization methods such as genetic algorithms, the DRL models achieved comparable economic performance with up to 135x faster inference speeds, making them well-suited for real-time applications. The framework's scalability and adaptability highlight its potential as a foundation for intelligent, data-driven decision-making in modern manufacturing environments.
comment: 50 pages, 10 figures
A Scalable Procedure for $\mathcal{H}_{\infty}-$Control Design
This paper proposes a novel gradient based scalable procedure for $\mathcal{H}_{\infty}-$control design. We compute the gradient using algebraic Riccati equation and then couple it with a novel Armijo rule inspired step-size selection procedure. We perform numerical experiments of the proposed solution procedure on an exhaustive list of benchmark engineering systems to show its convergence properties. Finally we compare our proposed solution procedure with available semi-definite programming based gradient-descent algorithm to demonstrate its scalability.
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
Robust 2D lidar-based SLAM in arboreal environments without IMU/GNSS
Simultaneous localization and mapping (SLAM) approaches for mobile robots remains challenging in forest or arboreal fruit farming environments, where tree canopies obstruct Global Navigation Satellite Systems (GNSS) signals. Unlike indoor settings, these agricultural environments possess additional challenges due to outdoor variables such as foliage motion and illumination variability. This paper proposes a solution based on 2D lidar measurements, which requires less processing and storage, and is more cost-effective, than approaches that employ 3D lidars. Utilizing the modified Hausdorff distance (MHD) metric, the method can solve the scan matching robustly and with high accuracy without needing sophisticated feature extraction. The method's robustness was validated using public datasets and considering various metrics, facilitating meaningful comparisons for future research. Comparative evaluations against state-of-the-art algorithms, particularly A-LOAM, show that the proposed approach achieves lower positional and angular errors while maintaining higher accuracy and resilience in GNSS-denied settings. This work contributes to the advancement of precision agriculture by enabling reliable and autonomous navigation in challenging outdoor environments.
Comparative Analysis of Black-Box Optimization Methods for Weather Intervention Design
As climate change increases the threat of weather-related disasters, research on weather control is gaining importance. The objective of weather control is to mitigate disaster risks by administering interventions with optimal timing, location, and intensity. However, the optimization process is highly challenging due to the vast scale and complexity of weather phenomena, which introduces two major challenges. First, obtaining accurate gradient information for optimization is difficult. In addition, numerical weather prediction (NWP) models demand enormous computational resources, necessitating parameter optimization with minimal function evaluations. To address these challenges, this study proposes a method for designing weather interventions based on black-box optimization, which enables efficient exploration without requiring gradient information. The proposed method is evaluated in two distinct control scenarios: one-shot initial value intervention and sequential intervention based on model predictive control. Furthermore, a comparative analysis is conducted among four representative black-box optimization methods in terms of total rainfall reduction. Experimental results show that Bayesian optimization achieves higher control effectiveness than the others, particularly in high-dimensional search spaces. These findings suggest that Bayesian optimization is a highly effective approach for weather intervention computation.
comment: 15 pages, 11 figures
Enhancing Secrecy Energy Efficiency in RIS-Aided Aerial Mobile Edge Computing Networks: A Deep Reinforcement Learning Approach
This paper studies the problem of securing task offloading transmissions from ground users against ground eavesdropping threats. Our study introduces a reconfigurable intelligent surface (RIS)-aided unmanned aerial vehicle (UAV)-mobile edge computing (MEC) scheme to enhance the secure task offloading while minimizing the energy consumption of the UAV subject to task completion constraints. Leveraging a data-driven approach, we propose a comprehensive optimization strategy that jointly optimizes the aerial MEC (AMEC)'s trajectory, task offloading partitioning, UE transmission scheduling, and RIS phase shifts. Our objective centers on optimizing the secrecy energy efficiency (SEE) of UE task offloading transmissions while preserving the AMEC's energy resources and meeting the task completion time requirements. Numerical results show that the proposed solution can effectively safeguard legitimate task offloading transmissions while preserving AMEC energy.
comment: This article has been accepted for publication in the IEEE 2025 International Conference on Communications (ICC2025)
RAN Tester UE: An Automated Declarative UE Centric Security Testing Platform
Cellular networks require strict security procedures and measures across various network components, from core to radio access network (RAN) and end-user devices. As networks become increasingly complex and interconnected, as in O-RAN deployments, they are exposed to a numerous security threats. Therefore, ensuring robust security is critical for O-RAN to protect network integrity and safeguard user data. This requires rigorous testing methodologies to mitigate threats. This paper introduces an automated, adaptive, and scalable user equipment (UE) based RAN security testing framework designed to address the shortcomings of existing RAN testing solutions. Experimental results on a 5G software radio testbed built with commercial off-the-shelf hardware and open source software validate the efficiency and reproducibility of sample security test procedures developed on the RAN Tester UE framework.
comment: This article has been accepted for publication in the ACM Symposium on Access Control Models and Technologies (SACMAT'25)
Optimal $\mathbb{H}_2$ Control with Passivity-Constrained Feedback: Convex Approach
We consider the $\mathbb{H}_2$-optimal feedback control problem, for the case in which the plant is passive with bounded $\mathbb{L}_2$ gain, and the feedback law is constrained to be output-strictly passive. In this circumstance, we show that this problem distills to a convex optimal control problem, in which the optimization domain is the associated Youla parameter for the closed-loop system. This enables the globally-optimal controller to be solved as an infinite-dimensional but convex optimization. Near-optimal solutions may be found through the finite-dimensional convex truncation of this infinite-dimensional domain. The idea is demonstrated on a simple vibration suppression example.
On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks
This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the $l_1$-norm estimator overcomes the challenges as long as the probability that the system is under adversarial attack at a given time is smaller than a certain threshold. We provide an estimation error bound that decays with the input memory length and prove its optimality by constructing a problem instance that suffers from the same bound under adversarial attacks. Our work provides a sharp input-output analysis for a generic nonlinear and partially observed system under significantly generalized assumptions compared to existing works.
comment: 28 pages, 2 figures
Data-based control of Logical Networks
In recent years, data-driven approaches have become increasingly pervasive across all areas of control engineering. However, the applications of data-based techniques to Boolean Control Networks (BCNs) are still very limited. In this paper we aim to fill this gap, by exploring the possibility of solving three fundamental control problems, i.e., state feedback stabilization, safe control and output regulation, for a BCN, leveraging only a limited amount of data generated by the network, without knowing or identifying its model.
Dynamic Beam-Stabilized, Additive-Printed Flexible Antenna Arrays with On-Chip Rapid Insight Generation
Conformal phased arrays promise shape-changing properties, multiple degrees of freedom to the scan angle, and novel applications in wearables, aerospace, defense, vehicles, and ships. However, they have suffered from two critical limitations. (1) Although most applications require on-the-move communication and sensing, prior conformal arrays have suffered from dynamic deformation-induced beam pointing errors. We introduce a Dynamic Beam-Stabilized (DBS) processor capable of beam adaptation through on-chip real-time control of fundamental gain, phase, and delay for each element. (2) Prior conformal arrays have leveraged additive printing to enhance flexibility, but conventional printable inks based on silver are expensive, and those based on copper suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. We instead leverage a low-cost Copper Molecular Decomposition (CuMOD) ink with < 0.1% variation per degree C with temperature and strain and correct any residual deformity in real-time using the DBS processor. Demonstrating unified material and physical deformation correction, our CMOS DBS processor is low power, low-area, and easily scalable due to a tile architecture, thereby ideal for on-device implementations.
comment: Uploaded by mistake, this is second version of the previous upload real-time deformation correction paper (arXiv:2406.07797). I will upload this paper as a second version of that paper
Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses
State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to static decoder boundaries. Goal: We introduce the Reviewer, a 3D visual interface projecting EMG signals directly into the decoder's classification space, providing intuitive, real-time insight into PR algorithm behavior. This structured feedback reduces cognitive load and fosters mutual, data-driven adaptation between user-generated EMG patterns and decoder boundaries. Methods: A 10-session study with 12 able-bodied participants compared PR performance after motor-based training and updating using the Reviewer versus conventional virtual arm visualization. Performance was assessed using a Fitts law task that involved the aperture of the cursor and the control of orientation. Results: Participants trained with the Reviewer achieved higher completion rates, reduced overshoot, and improved path efficiency and throughput compared to the standard visualization group. Significance: The Reviewer introduces decoder-informed motor training, facilitating immediate and consistent PR-based myoelectric control improvements. By iteratively refining control through real-time feedback, this approach reduces reliance on trial-and-error recalibration, enabling a more adaptive, self-correcting training framework. Conclusion: The 3D visual feedback significantly improves PR control in novice operators through structured training, enabling feedback-driven adaptation and reducing reliance on extensive heuristic adjustments.
Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.
comment: 12 pages, 15 figures
Damping Identification Sensitivity in Flutter Speed Estimation
Predicting flutter remains a key challenge in aeroelastic research, with certain models relying on modal parameters, such as natural frequencies and damping ratios. These models are particularly useful in early design stages or for the development of small Unmanned Aerial Vehicles (maximum take-off mass below 7 kg). This study evaluates two frequency-domain system identification methods, Fast Relaxed Vector Fitting (FRVF) and the Loewner Framework (LF), for predicting the flutter onset speed of a flexible wing model. Both methods are applied to extract modal parameters from Ground Vibration Testing data, which are subsequently used to develop a reduced-order model with two degrees of freedom. Results indicate that FRVF and LF-informed models provide reliable flutter speed, with predictions deviating by no more than 3% (FRVF) and 5% (LF) from the N4SID-informed benchmark. The findings highlight the sensitivity of flutter speed predictions to damping ratio identification accuracy and demonstrate the potential of these methods as computationally efficient alternatives for preliminary aeroelastic assessments.
Satellite Autonomous Clock Fault Monitoring with Inter-Satellite Ranges Using Euclidean Distance Matrices
To address the need for robust positioning, navigation, and timing services in lunar environments, this paper proposes a novel onboard clock phase jump detection framework for satellite constellations using range measurements obtained from dual one-way inter-satellite links. Our approach leverages vertex redundantly rigid graphs to detect faults without relying on prior knowledge of satellite positions or clock biases, providing flexibility for lunar satellite networks with diverse satellite types and operators. We model satellite constellations as graphs, where satellites are vertices and inter-satellite links are edges. The proposed algorithm detects and identifies satellites with clock jumps by monitoring the singular values of the geometric-centered Euclidean distance matrix (GCEDM) of 5-clique sub-graphs. The proposed method is validated through simulations of a GPS constellation and a notional constellation around the Moon, demonstrating its effectiveness in various configurations.
comment: This manuscript was submitted to the NAVIGATION: Journal of the Institute of Navigation
Demonstrating a Control Framework for Physical Human-Robot Interaction Toward Industrial Applications
Physical Human-Robot Interaction (pHRI) is critical for implementing Industry 5.0, which focuses on human-centric approaches. However, few studies explore the practical alignment of pHRI to industrial-grade performance. This paper introduces a versatile control framework designed to bridge this gap by incorporating the torque-based control modes: compliance control, null-space compliance, and dual compliance, all in static and dynamic scenarios. Thanks to our second-order Quadratic Programming (QP) formulation, strict kinematic and collision constraints are integrated into the system as safety features, and a weighted hierarchy guarantees singularity-robust task tracking performance. The framework is implemented on a Kinova Gen3 collaborative robot (cobot) equipped with a Bota force/torque sensor. A DualShock 4 game controller is attached to the robot's end-effector to demonstrate the framework's capabilities. This setup enables seamless dynamic switching between the modes, and real-time adjustments of parameters, such as transitioning between position and torque control or selecting a more robust custom-developed low-level torque controller over the default one. Built on the open-source robotic control software mc_rtc, our framework ensures reproducibility for both research and industrial deployment, this framework demonstrates a step toward industrial-grade performance and repeatability, showcasing its potential as a robust pHRI control system for industrial environments.
comment: Demo Paper submitted to Robotics: Science and Systems (RSS2025), accepted
A finite-sample bound for identifying partially observed linear switched systems from a single trajectory
We derive a finite-sample probabilistic bound on the parameter estimation error of a system identification algorithm for Linear Switched Systems. The algorithm estimates Markov parameters from a single trajectory and applies a variant of the Ho-Kalman algorithm to recover the system matrices. Our bound guarantees statistical consistency under the assumption that the true system exhibits quadratic stability. The proof leverages the theory of weakly dependent processes. To the best of our knowledge, this is the first finite-sample bound for this algorithm in the single-trajectory setting.
Large Vision Model-Enhanced Digital Twin with Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks
Optimization of user association in a densely deployed cellular network is usually challenging and even more complicated due to the dynamic nature of user mobility and fluctuation in user counts. While deep reinforcement learning (DRL) emerges as a promising solution, its application in practice is hindered by high trial-and-error costs in real world and unsatisfactory physical network performance during training. Also, existing DRL-based user association methods are typically applicable to scenarios with a fixed number of users due to convergence and compatibility challenges. To address these limitations, we introduce a large vision model (LVM)-enhanced digital twin (DT) for wireless networks and propose a parallel DT-driven DRL method for user association and load balancing in networks with dynamic user counts, distribution, and mobility patterns. To construct this LVM-enhanced DT for DRL training, we develop a zero-shot generative user mobility model, named Map2Traj, based on the diffusion model. Map2Traj estimates user trajectory patterns and spatial distributions solely from street maps. DRL models undergo training in the DT environment, avoiding direct interactions with physical networks. To enhance the generalization ability of DRL models for dynamic scenarios, a parallel DT framework is further established to alleviate strong correlation and non-stationarity in single-environment training and improve training efficiency. Numerical results show that the developed LVM-enhanced DT achieves closely comparable training efficacy to the real environment, and the proposed parallel DT framework even outperforms the single real-world environment in DRL training with nearly 20\% gain in terms of cell-edge user performance.
comment: arXiv admin note: text overlap with arXiv:2407.19765. This work has been submitted to the IEEE for possible publication
Non-Blocking Robustness Analysis in Discrete Event Systems
This paper presents a mathematical framework for characterizing state blocking in discrete event systems (DES) under transition deletions. We introduce a path-based analysis approach that determines whether systems maintain non-blocking properties when transitions are removed. Through formal analysis and case studies, we establish three key contributions: a mathematical characterization of transition-induced blocking with necessary and sufficient conditions, a definition of robust deviations that preserve non-blocking properties, and an algorithm for identifying critical transitions and analyzing system behavior under deletions. Our algorithm reduces computational complexity by leveraging minimal blocking sets, achieving significant reduction in computational requirements. We demonstrate the framework's effectiveness through manufacturing system and autonomous vehicle case studies, showing substantial improvements in identifying critical transitions and predicting potential blocking scenarios across different application domains.
comment: 9 pages, 8 figures, conference paper submitted to IEEE
Integrating Koopman theory and Lyapunov stability for enhanced model predictive control in nonlinear systems
This paper delves into the challenges posed by the increasing complexity of modern control systems, specifically focusing on bilinear systems, a prevalent subclass of non-linear systems characterized by state dynamics influenced by the interaction of state and control variables. Traditional control strategies, such as PID controllers, often fall short in adequately addressing the intricacies of such systems due to their predictive limitations. To bridge this gap, we introduce Model Predictive Control (MPC), a sophisticated technique that utilizes system models to forecast future behaviors, allowing for the computation of an optimal control sequence by minimizing deviations and control efforts. The Koopman operator emerges as a pivotal tool in this framework by providing a means to linearize the nonlinear dynamics of bilinear systems. By integrating the principles of Lyapunov theory with the linearizing capabilities of the Koopman operator into the MPC framework, we give rise to Koopman Lyapunov-based Model Predictive Control (Koopman LMPC). This approach not only retains MPC's predictive capabilities but also harnesses the Koopman operator's ability to transform complex nonlinear behaviors into a linear framework, thereby enhancing the robustness and applicability of LMPC. With the stability assurances from Lyapunov theory, Koopman LMPC presents a robust solution to effectively control and stabilize bilinear systems. The paper underscores the efficacy of Koopman LMPC, emphasizing its significance in achieving optimal performance and system stability, marking it as a promising approach for the future of advanced control systems.
comment: 19 pages, 6 figures, 2 author photos. Affiliations: Electrical Engineering Department, Pennsylvania State University. Keywords: Lyapunov Stability, Model Predictive Control (MPC), Control Optimization, Koopman Operator, Bilinear Systems, Nonlinear Systems, Closed-loop Design, Control Constraints
Economic Capacity Withholding Bounds of Competitive Energy Storage Bidders
Economic withholding in electricity markets refers to generators bidding higher than their true marginal fuel cost, and is a typical approach to exercising market power. However, existing market designs require storage to design bids strategically based on their own future price predictions, motivating storage to conduct economic withholding without assuming market power. As energy storage takes up more significant roles in wholesale electricity markets, understanding its motivations for economic withholding and the consequent effects on social welfare becomes increasingly vital. This paper derives a theoretical framework to study the economic capacity withholding behavior of storage participating in competitive electricity markets and validate our results in simulations based on the ISO New England system. We demonstrate that storage bids can reach unbounded high levels under conditions where future price predictions show bounded expectations but unbounded deviations. Conversely, in scenarios with peak price limitations, we show the upper bounds of storage bids are grounded in bounded price expectations. Most importantly, we show that storage capacity withholding can potentially lower the overall system cost when price models account for system uncertainties. Our paper reveals energy storage is not a market manipulator but an honest player contributing to the social welfare. It helps electricity market researchers and operators better understand the economic withholding behavior of storage and reform market policies to maximize storage contributing to a cost-efficient decolonization.
Robotics
Loop closure grasping: Topological transformations enable strong, gentle, and versatile grasps
Grasping mechanisms must both create and subsequently hold grasps that permit safe and effective object manipulation. Existing mechanisms address the different functional requirements of grasp creation and grasp holding using a single morphology, but have yet to achieve the simultaneous strength, gentleness, and versatility needed for many applications. We present "loop closure grasping", a class of robotic grasping that addresses these different functional requirements through topological transformations between open-loop and closed-loop morphologies. We formalize these morphologies for grasping, formulate the loop closure grasping method, and present principles and a design architecture that we implement using soft growing inflated beams, winches, and clamps. The mechanisms' initial open-loop topology enables versatile grasp creation via unencumbered tip movement, and closing the loop enables strong and gentle holding with effectively infinite bending compliance. Loop closure grasping circumvents the tradeoffs of single-morphology designs, enabling grasps involving historically challenging objects, environments, and configurations.
Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning
Foundation models can provide robust high-level reasoning on appropriate safety interventions in hazardous scenarios beyond a robot's training data, i.e. out-of-distribution (OOD) failures. However, due to the high inference latency of Large Vision and Language Models, current methods rely on manually defined intervention policies to enact fallbacks, thereby lacking the ability to plan generalizable, semantically safe motions. To overcome these challenges we present FORTRESS, a framework that generates and reasons about semantically safe fallback strategies in real time to prevent OOD failures. At a low frequency in nominal operations, FORTRESS uses multi-modal reasoners to identify goals and anticipate failure modes. When a runtime monitor triggers a fallback response, FORTRESS rapidly synthesizes plans to fallback goals while inferring and avoiding semantically unsafe regions in real time. By bridging open-world, multi-modal reasoning with dynamics-aware planning, we eliminate the need for hard-coded fallbacks and human safety interventions. FORTRESS outperforms on-the-fly prompting of slow reasoning models in safety classification accuracy on synthetic benchmarks and real-world ANYmal robot data, and further improves system safety and planning success in simulation and on quadrotor hardware for urban navigation.
comment: Website: https://milanganai.github.io/fortress/
AORRTC: Almost-Surely Asymptotically Optimal Planning with RRT-Connect
Finding high-quality solutions quickly is an important objective in motion planning. This is especially true for high-degree-of-freedom robots. Satisficing planners have traditionally found feasible solutions quickly but provide no guarantees on their optimality, while almost-surely asymptotically optimal (a.s.a.o.) planners have probabilistic guarantees on their convergence towards an optimal solution but are more computationally expensive. This paper uses the AO-x meta-algorithm to extend the satisficing RRT-Connect planner to optimal planning. The resulting Asymptotically Optimal RRT-Connect (AORRTC) finds initial solutions in similar times as RRT-Connect and uses any additional planning time to converge towards the optimal solution in an anytime manner. It is proven to be probabilistically complete and a.s.a.o. AORRTC was tested with the Panda (7 DoF) and Fetch (8 DoF) robotic arms on the MotionBenchMaker dataset. These experiments show that AORRTC finds initial solutions as fast as RRT-Connect and faster than the tested state-of-the-art a.s.a.o. algorithms while converging to better solutions faster. AORRTC finds solutions to difficult high-DoF planning problems in milliseconds where the other a.s.a.o. planners could not consistently find solutions in seconds. This performance was demonstrated both with and without single instruction/multiple data (SIMD) acceleration.
comment: 8 pages, 4 figures, 1 table. A video of AORRTC can be found at https://www.youtube.com/watch?v=j1itxP3KuiM . Information on the implementation of AORRTC is available at https://robotic-esp.com/code/aorrtc/
Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation
Reinforcement learning (RL) has demonstrated remarkable potential in robotic manipulation but faces challenges in sample inefficiency and lack of interpretability, limiting its applicability in real world scenarios. Enabling the agent to gain a deeper understanding and adapt more efficiently to diverse working scenarios is crucial, and strategic knowledge utilization is a key factor in this process. This paper proposes a Knowledge Capture, Adaptation, and Composition (KCAC) framework to systematically integrate knowledge transfer into RL through cross-task curriculum learning. KCAC is evaluated using a two block stacking task in the CausalWorld benchmark, a complex robotic manipulation environment. To our knowledge, existing RL approaches fail to solve this task effectively, reflecting deficiencies in knowledge capture. In this work, we redesign the benchmark reward function by removing rigid constraints and strict ordering, allowing the agent to maximize total rewards concurrently and enabling flexible task completion. Furthermore, we define two self-designed sub-tasks and implement a structured cross-task curriculum to facilitate efficient learning. As a result, our KCAC approach achieves a 40 percent reduction in training time while improving task success rates by 10 percent compared to traditional RL methods. Through extensive evaluation, we identify key curriculum design parameters subtask selection, transition timing, and learning rate that optimize learning efficiency and provide conceptual guidance for curriculum based RL frameworks. This work offers valuable insights into curriculum design in RL and robotic learning.
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning
Imitation learning (IL) and reinforcement learning (RL) each offer distinct advantages for robotics policy learning: IL provides stable learning from demonstrations, and RL promotes generalization through exploration. While existing robot learning approaches using IL-based pre-training followed by RL-based fine-tuning are promising, this two-step learning paradigm often suffers from instability and poor sample efficiency during the RL fine-tuning phase. In this work, we introduce IN-RIL, INterleaved Reinforcement learning and Imitation Learning, for policy fine-tuning, which periodically injects IL updates after multiple RL updates and hence can benefit from the stability of IL and the guidance of expert data for more efficient exploration throughout the entire fine-tuning process. Since IL and RL involve different optimization objectives, we develop gradient separation mechanisms to prevent destructive interference during \ABBR fine-tuning, by separating possibly conflicting gradient updates in orthogonal subspaces. Furthermore, we conduct rigorous analysis, and our findings shed light on why interleaving IL with RL stabilizes learning and improves sample-efficiency. Extensive experiments on 14 robot manipulation and locomotion tasks across 3 benchmarks, including FurnitureBench, OpenAI Gym, and Robomimic, demonstrate that \ABBR can significantly improve sample efficiency and mitigate performance collapse during online finetuning in both long- and short-horizon tasks with either sparse or dense rewards. IN-RIL, as a general plug-in compatible with various state-of-the-art RL algorithms, can significantly improve RL fine-tuning, e.g., from 12\% to 88\% with 6.3x improvement in the success rate on Robomimic Transport. Project page: https://github.com/ucd-dare/IN-RIL.
Internal State Estimation in Groups via Active Information Gathering
Accurately estimating human internal states, such as personality traits or behavioral patterns, is critical for enhancing the effectiveness of human-robot interaction, particularly in group settings. These insights are key in applications ranging from social navigation to autism diagnosis. However, prior methods are limited by scalability and passive observation, making real-time estimation in complex, multi-human settings difficult. In this work, we propose a practical method for active human personality estimation in groups, with a focus on applications related to Autism Spectrum Disorder (ASD). Our method combines a personality-conditioned behavior model, based on the Eysenck 3-Factor theory, with an active robot information gathering policy that triggers human behaviors through a receding-horizon planner. The robot's belief about human personality is then updated via Bayesian inference. We demonstrate the effectiveness of our approach through simulations, user studies with typical adults, and preliminary experiments involving participants with ASD. Our results show that our method can scale to tens of humans and reduce personality prediction error by 29.2% and uncertainty by 79.9% in simulation. User studies with typical adults confirm the method's ability to generalize across complex personality distributions. Additionally, we explore its application in autism-related scenarios, demonstrating that the method can identify the difference between neurotypical and autistic behavior, highlighting its potential for diagnosing ASD. The results suggest that our framework could serve as a foundation for future ASD-specific interventions.
AutoCam: Hierarchical Path Planning for an Autonomous Auxiliary Camera in Surgical Robotics
Incorporating an autonomous auxiliary camera into robot-assisted minimally invasive surgery (RAMIS) enhances spatial awareness and eliminates manual viewpoint control. Existing path planning methods for auxiliary cameras track two-dimensional surgical features but do not simultaneously account for camera orientation, workspace constraints, and robot joint limits. This study presents AutoCam: an automatic auxiliary camera placement method to improve visualization in RAMIS. Implemented on the da Vinci Research Kit, the system uses a priority-based, workspace-constrained control algorithm that combines heuristic geometric placement with nonlinear optimization to ensure robust camera tracking. A user study (N=6) demonstrated that the system maintained 99.84% visibility of a salient feature and achieved a pose error of 4.36 $\pm$ 2.11 degrees and 1.95 $\pm$ 5.66 mm. The controller was computationally efficient, with a loop time of 6.8 $\pm$ 12.8 ms. An additional pilot study (N=6), where novices completed a Fundamentals of Laparoscopic Surgery training task, suggests that users can teleoperate just as effectively from AutoCam's viewpoint as from the endoscope's while still benefiting from AutoCam's improved visual coverage of the scene. These results indicate that an auxiliary camera can be autonomously controlled using the da Vinci patient-side manipulators to track a salient feature, laying the groundwork for new multi-camera visualization methods in RAMIS.
comment: 13 pages, 9 figures
NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning
Recent advances in deep generative models demonstrate unprecedented zero-shot generalization capabilities, offering great potential for robot manipulation in unstructured environments. Given a partial observation of a scene, deep generative models could generate the unseen regions and therefore provide more context, which enhances the capability of robots to generalize across unseen environments. However, due to the visual artifacts in generated images and inefficient integration of multi-modal features in policy learning, this direction remains an open challenge. We introduce NVSPolicy, a generalizable language-conditioned policy learning method that couples an adaptive novel-view synthesis module with a hierarchical policy network. Given an input image, NVSPolicy dynamically selects an informative viewpoint and synthesizes an adaptive novel-view image to enrich the visual context. To mitigate the impact of the imperfect synthesized images, we adopt a cycle-consistent VAE mechanism that disentangles the visual features into the semantic feature and the remaining feature. The two features are then fed into the hierarchical policy network respectively: the semantic feature informs the high-level meta-skill selection, and the remaining feature guides low-level action estimation. Moreover, we propose several practical mechanisms to make the proposed method efficient. Extensive experiments on CALVIN demonstrate the state-of-the-art performance of our method. Specifically, it achieves an average success rate of 90.4\% across all tasks, greatly outperforming the recent methods. Ablation studies confirm the significance of our adaptive novel-view synthesis paradigm. In addition, we evaluate NVSPolicy on a real-world robotic platform to demonstrate its practical applicability.
pc-dbCBS: Kinodynamic Motion Planning of Physically-Coupled Robot Teams
Motion planning problems for physically-coupled multi-robot systems in cluttered environments are challenging due to their high dimensionality. Existing methods combining sampling-based planners with trajectory optimization produce suboptimal results and lack theoretical guarantees. We propose Physically-coupled discontinuity-bounded Conflict-Based Search (pc-dbCBS), an anytime kinodynamic motion planner, that extends discontinuity-bounded CBS to rigidly-coupled systems. Our approach proposes a tri-level conflict detection and resolution framework that includes the physical coupling between the robots. Moreover, pc-dbCBS alternates iteratively between state space representations, thereby preserving probabilistic completeness and asymptotic optimality while relying only on single-robot motion primitives. Across 25 simulated and six real-world problems involving multirotors carrying a cable-suspended payload and differential-drive robots linked by rigid rods, pc-dbCBS solves up to 92% more instances than a state-of-the-art baseline and plans trajectories that are 50-60% faster while reducing planning time by an order of magnitude.
comment: This work has been submitted to the IEEE for possible publication
Inferring Driving Maps by Deep Learning-based Trail Map Extraction CVPR
High-definition (HD) maps offer extensive and accurate environmental information about the driving scene, making them a crucial and essential element for planning within autonomous driving systems. To avoid extensive efforts from manual labeling, methods for automating the map creation have emerged. Recent trends have moved from offline mapping to online mapping, ensuring availability and actuality of the utilized maps. While the performance has increased in recent years, online mapping still faces challenges regarding temporal consistency, sensor occlusion, runtime, and generalization. We propose a novel offline mapping approach that integrates trails - informal routes used by drivers - into the map creation process. Our method aggregates trail data from the ego vehicle and other traffic participants to construct a comprehensive global map using transformer-based deep learning models. Unlike traditional offline mapping, our approach enables continuous updates while remaining sensor-agnostic, facilitating efficient data transfer. Our method demonstrates superior performance compared to state-of-the-art online mapping approaches, achieving improved generalization to previously unseen environments and sensor configurations. We validate our approach on two benchmark datasets, highlighting its robustness and applicability in autonomous driving systems.
comment: This paper was accepted at the CVPR WAD 2025 Workshop
SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning
Research on autonomous robotic surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications require dexterous manipulation over extended time scales while demanding generalization across diverse variations in human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strategies. To bridge this gap, we propose a hierarchical framework for dexterous, long-horizon surgical tasks. Our method employs a high-level policy for task planning and a low-level policy for generating task-space controls for the surgical robot. The high-level planner plans tasks using language, producing task-specific or corrective instructions that guide the robot at a coarse level. Leveraging language as a planning modality offers an intuitive and generalizable interface, mirroring how experienced surgeons instruct traineers during procedures. We validate our framework in ex-vivo experiments on a complex minimally invasive procedure, cholecystectomy, and conduct ablative studies to assess key design choices. Our approach achieves a 100% success rate across n=8 different ex-vivo gallbladders, operating fully autonomously without human intervention. The hierarchical approach greatly improves the policy's ability to recover from suboptimal states that are inevitable in the highly dynamic environment of realistic surgical applications. This work represents the first demonstration of step-level autonomy, marking a critical milestone toward autonomous surgical systems for clinical studies. By advancing generalizable autonomy in surgical robotics, our approach brings the field closer to real-world deployment.
Context-aware collaborative pushing of heavy objects using skeleton-based intention prediction ICRA 2025
In physical human-robot interaction, force feedback has been the most common sensing modality to convey the human intention to the robot. It is widely used in admittance control to allow the human to direct the robot. However, it cannot be used in scenarios where direct force feedback is not available since manipulated objects are not always equipped with a force sensor. In this work, we study one such scenario: the collaborative pushing and pulling of heavy objects on frictional surfaces, a prevalent task in industrial settings. When humans do it, they communicate through verbal and non-verbal cues, where body poses, and movements often convey more than words. We propose a novel context-aware approach using Directed Graph Neural Networks to analyze spatio-temporal human posture data to predict human motion intention for non-verbal collaborative physical manipulation. Our experiments demonstrate that robot assistance significantly reduces human effort and improves task efficiency. The results indicate that incorporating posture-based context recognition, either together with or as an alternative to force sensing, enhances robot decision-making and control efficiency.
comment: Accepted to be presented at ICRA 2025 conference. Video: https://youtu.be/qy7l_wGOyzo
Quad-LCD: Layered Control Decomposition Enables Actuator-Feasible Quadrotor Trajectory Planning
In this work, we specialize contributions from prior work on data-driven trajectory generation for a quadrotor system with motor saturation constraints. When motors saturate in quadrotor systems, there is an ``uncontrolled drift" of the vehicle that results in a crash. To tackle saturation, we apply a control decomposition and learn a tracking penalty from simulation data consisting of low, medium and high-cost reference trajectories. Our approach reduces crash rates by around $49\%$ compared to baselines on aggressive maneuvers in simulation. On the Crazyflie hardware platform, we demonstrate feasibility through experiments that lead to successful flights. Motivated by the growing interest in data-driven methods to quadrotor planning, we provide open-source lightweight code with an easy-to-use abstraction of hardware platforms.
comment: 4 pages, 4 figures
Force-Driven Validation for Collaborative Robotics in Automated Avionics Testing
ARTO is a project combining collaborative robots (cobots) and Artificial Intelligence (AI) to automate functional test procedures for civilian and military aircraft certification. This paper proposes a Deep Learning (DL) and eXplainable AI (XAI) approach, equipping ARTO with interaction analysis capabilities to verify and validate the operations on cockpit components. During these interactions, forces, torques, and end effector poses are recorded and preprocessed to filter disturbances caused by low performance force controllers and embedded Force Torque Sensors (FTS). Convolutional Neural Networks (CNNs) then classify the cobot actions as Success or Fail, while also identifying and reporting the causes of failure. To improve interpretability, Grad CAM, an XAI technique for visual explanations, is integrated to provide insights into the models decision making process. This approach enhances the reliability and trustworthiness of the automated testing system, facilitating the diagnosis and rectification of errors that may arise during testing.
Towards Safe Robot Foundation Models Using Inductive Biases
Safety is a critical requirement for the real-world deployment of robotic systems. Unfortunately, while current robot foundation models show promising generalization capabilities across a wide variety of tasks, they fail to address safety, an important aspect for ensuring long-term operation. Current robot foundation models assume that safe behavior should emerge by learning from a sufficiently large dataset of demonstrations. However, this approach has two clear major drawbacks. Firstly, there are no formal safety guarantees for a behavior cloning policy trained using supervised learning. Secondly, without explicit knowledge of any safety constraints, the policy may require an unreasonable number of additional demonstrations to even approximate the desired constrained behavior. To solve these key issues, we show how we can instead combine robot foundation models with geometric inductive biases using ATACOM, a safety layer placed after the foundation policy that ensures safe state transitions by enforcing action constraints. With this approach, we can ensure formal safety guarantees for generalist policies without providing extensive demonstrations of safe behavior, and without requiring any specific fine-tuning for safety. Our experiments show that our approach can be beneficial both for classical manipulation tasks, where we avoid unwanted collisions with irrelevant objects, and for dynamic tasks, such as the robot air hockey environment, where we can generate fast trajectories respecting complex tasks and joint space constraints.
comment: 14 pages, 5 figures
Training People to Reward Robots
Learning from demonstration (LfD) is a technique that allows expert teachers to teach task-oriented skills to robotic systems. However, the most effective way of guiding novice teachers to approach expert-level demonstrations quantitatively for specific teaching tasks remains an open question. To this end, this paper investigates the use of machine teaching (MT) to guide novice teachers to improve their teaching skills based on reinforcement learning from demonstration (RLfD). The paper reports an experiment in which novices receive MT-derived guidance to train their ability to teach a given motor skill with only 8 demonstrations and generalise this to previously unseen ones. Results indicate that the MT-guidance not only enhances robot learning performance by 89% on the training skill but also causes a 70% improvement in robot learning performance on skills not seen by subjects during training. These findings highlight the effectiveness of MT-guidance in upskilling human teaching behaviours, ultimately improving demonstration quality in RLfD.
comment: 6 pages
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
We present EmbodiedMAE, a unified 3D multi-modal representation for robot manipulation. Current approaches suffer from significant domain gaps between training datasets and robot manipulation tasks, while also lacking model architectures that can effectively incorporate 3D information. To overcome these limitations, we enhance the DROID dataset with high-quality depth maps and point clouds, constructing DROID-3D as a valuable supplement for 3D embodied vision research. Then we develop EmbodiedMAE, a multi-modal masked autoencoder that simultaneously learns representations across RGB, depth, and point cloud modalities through stochastic masking and cross-modal fusion. Trained on DROID-3D, EmbodiedMAE consistently outperforms state-of-the-art vision foundation models (VFMs) in both training efficiency and final performance across 70 simulation tasks and 20 real-world robot manipulation tasks on two robot platforms. The model exhibits strong scaling behavior with size and promotes effective policy learning from 3D inputs. Experimental results establish EmbodiedMAE as a reliable unified 3D multi-modal VFM for embodied AI systems, particularly in precise tabletop manipulation settings where spatial perception is critical.
FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation
This paper investigates training better visual world models for robot manipulation, i.e., models that can predict future visual observations by conditioning on past frames and robot actions. Specifically, we consider world models that operate on RGB-D frames (RGB-D world models). As opposed to canonical approaches that handle dynamics prediction mostly implicitly and reconcile it with visual rendering in a single model, we introduce FlowDreamer, which adopts 3D scene flow as explicit motion representations. FlowDreamer first predicts 3D scene flow from past frame and action conditions with a U-Net, and then a diffusion model will predict the future frame utilizing the scene flow. FlowDreamer is trained end-to-end despite its modularized nature. We conduct experiments on 4 different benchmarks, covering both video prediction and visual planning tasks. The results demonstrate that FlowDreamer achieves better performance compared to other baseline RGB-D world models by 7% on semantic similarity, 11% on pixel quality, and 6% on success rate in various robot manipulation domains.
comment: Project page: see https://sharinka0715.github.io/FlowDreamer/
Multi-Robot Task Allocation for Homogeneous Tasks with Collision Avoidance via Spatial Clustering
In this paper, a novel framework is presented that achieves a combined solution based on Multi-Robot Task Allocation (MRTA) and collision avoidance with respect to homogeneous measurement tasks taking place in industrial environments. The spatial clustering we propose offers to simultaneously solve the task allocation problem and deal with collision risks by cutting the workspace into distinguishable operational zones for each robot. To divide task sites and to schedule robot routes within corresponding clusters, we use K-means clustering and the 2-Opt algorithm. The presented framework shows satisfactory performance, where up to 93\% time reduction (1.24s against 17.62s) with a solution quality improvement of up to 7\% compared to the best performing method is demonstrated. Our method also completely eliminates collision points that persist in comparative methods in a most significant sense. Theoretical analysis agrees with the claim that spatial partitioning unifies the apparently disjoint tasks allocation and collision avoidance problems under conditions of many identical tasks to be distributed over sparse geographical areas. Ultimately, the findings in this work are of substantial importance for real world applications where both computational efficiency and operation free from collisions is of paramount importance.
comment: 5 pages, 4 figures, Scheduled for presentation at an upcoming conference
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests ICRA 2025
Despite significant advancements in Deep Reinforcement Learning (DRL) for Autonomous Surface Vehicles (ASVs), their robustness in real-world conditions, particularly under external disturbances, remains insufficiently explored. In this paper, we evaluate the resilience of a DRL-based agent designed to capture floating waste under various perturbations. We train the agent using domain randomization and evaluate its performance in real-world field tests, assessing its ability to handle unexpected disturbances such as asymmetric drag and an off-center payload. We assess the agent's performance under these perturbations in both simulation and real-world experiments, quantifying performance degradation and benchmarking it against an MPC baseline. Results indicate that the DRL agent performs reliably despite significant disturbances. Along with the open-source release of our implementation, we provide insights into effective training strategies, real-world challenges, and practical considerations for deploying DRLbased ASV controllers.
comment: Workshop on Field Robotics at ICRA 2025
Fast Heuristic Scheduling and Trajectory Planning for Robotic Fruit Harvesters with Multiple Cartesian Arms
This work proposes a fast heuristic algorithm for the coupled scheduling and trajectory planning of multiple Cartesian robotic arms harvesting fruits. Our method partitions the workspace, assigns fruit-picking sequences to arms, determines tight and feasible fruit-picking schedules and vehicle travel speed, and generates smooth, collision-free arm trajectories. The fruit-picking throughput achieved by the algorithm was assessed using synthetically generated fruit coordinates and a harvester design featuring up to 12 arms. The throughput increased monotonically as more arms were added. Adding more arms when fruit densities were low resulted in diminishing gains because it took longer to travel from one fruit to another. However, when there were enough fruits, the proposed algorithm achieved a linear speedup as the number of arms increased.
comment: This work will be submitted to the IEEE for possible publication
APEX: Action Priors Enable Efficient Exploration for Skill Imitation on Articulated Robots
Learning by imitation provides an effective way for robots to develop well-regulated complex behaviors and directly benefit from natural demonstrations. State-of-the-art imitation learning (IL) approaches typically leverage Adversarial Motion Priors (AMP), which, despite their impressive results, suffer from two key limitations. They are prone to mode collapse, which often leads to overfitting to the simulation environment and thus increased sim-to-real gap, and they struggle to learn diverse behaviors effectively. To overcome these limitations, we introduce APEX (Action Priors enable Efficient eXploration): a simple yet versatile imitation learning framework that integrates demonstrations directly into reinforcement learning (RL), maintaining high exploration while grounding behavior with expert-informed priors. We achieve this through a combination of decaying action priors, which initially bias exploration toward expert demonstrations but gradually allow the policy to explore independently. This is complemented by a multi-critic RL framework that effectively balances stylistic consistency with task performance. Our approach achieves sample-efficient imitation learning and enables the acquisition of diverse skills within a single policy. APEX generalizes to varying velocities and preserves reference-like styles across complex tasks such as navigating rough terrain and climbing stairs, utilizing only flat-terrain kinematic motion data as a prior. We validate our framework through extensive hardware experiments on the Unitree Go2 quadruped. There, APEX yields diverse and agile locomotion gaits, inherent gait transitions, and the highest reported speed for the platform to the best of our knowledge (peak velocity of ~3.3 m/s on hardware). Our results establish APEX as a compelling alternative to existing IL methods, offering better efficiency, adaptability, and real-world performance.
Threshold Strategy for Leaking Corner-Free Hamilton-Jacobi Reachability with Decomposed Computations
Hamilton-Jacobi (HJ) Reachability is widely used to compute value functions for states satisfying specific control objectives. However, it becomes intractable for high-dimensional problems due to the curse of dimensionality. Dimensionality reduction approaches are essential for mitigating this challenge, whereas they could introduce the ``leaking corner issue", leading to inaccuracies in the results. In this paper, we define the ``leaking corner issue" in terms of value functions, propose and prove a necessary condition for its occurrence. We then use these theoretical contributions to introduce a new local updating method that efficiently corrects inaccurate value functions while maintaining the computational efficiency of the dimensionality reduction approaches. We demonstrate the effectiveness of our method through numerical simulations. Although we validate our method with the self-contained subsystem decomposition (SCSD), our approach is applicable to other dimensionality reduction techniques that introduce the ``leaking corners".
comment: 7 pages, Submitted to Conference on Decision and Control (CDC)
LEMON-Mapping: Loop-Enhanced Large-Scale Multi-Session Point Cloud Merging and Optimization for Globally Consistent Mapping
With the rapid development of robotics, multi-robot collaboration has become critical and challenging. One key problem is integrating data from multiple robots to build a globally consistent and accurate map for robust cooperation and precise localization. While traditional multi-robot pose graph optimization (PGO) maintains basic global consistency, it focuses primarily on pose optimization and ignores the geometric structure of the map. Moreover, PGO only uses loop closure as a constraint between two nodes, failing to fully exploit its capability to maintaining local consistency of multi-robot maps. Therefore, PGO-based multi-robot mapping methods often suffer from serious map divergence and blur, especially in regions with overlapping submaps. To address this issue, we propose Lemon-Mapping, a loop-enhanced framework for large-scale multi-session point cloud map fusion and optimization, which reasonably utilizes loop closure and improves the geometric quality of the map. We re-examine the role of loops for multi-robot mapping and introduce three key innovations. First, we develop a robust loop processing mechanism that effectively rejects outliers and a novel loop recall strategy to recover mistakenly removed loops. Second, we introduce a spatial bundle adjustment method for multi-robot maps that significantly reduces the divergence in overlapping regions and eliminates map blur. Third, we design a PGO strategy that leverages the refined constraints of bundle adjustment to extend the local accuracy to the global map. We validate our framework on several public datasets and a self-collected dataset. Experimental results demonstrate that our method outperforms traditional map merging approaches in terms of mapping accuracy and reduction of map divergence. Scalability experiments also demonstrate the strong capability of our framework to handle scenarios involving numerous robots.
Provably safe and human-like car-following behaviors: Part 2. A parsimonious multi-phase model with projected braking
Ensuring safe and human-like trajectory planning for automated vehicles amidst real-world uncertainties remains a critical challenge. While existing car-following models often struggle to consistently provide rigorous safety proofs alongside human-like acceleration and deceleration patterns, we introduce a novel multi-phase projection-based car-following model. This model is designed to balance safety and performance by incorporating bounded acceleration and deceleration rates while emulating key human driving principles. Building upon a foundation of fundamental driving principles and a multi-phase dynamical systems analysis (detailed in Part 1 of this study \citep{jin2025WA20-02_Part1}), we first highlight the limitations of extending standard models like Newell's with simple bounded deceleration. Inspired by human drivers' anticipatory behavior, we mathematically define and analyze projected braking profiles for both leader and follower vehicles, establishing safety criteria and new phase definitions based on the projected braking lead-vehicle problem. The proposed parsimonious model combines an extended Newell's model for nominal driving with a new control law for scenarios requiring projected braking. Using speed-spacing phase plane analysis, we provide rigorous mathematical proofs of the model's adherence to defined safe and human-like driving principles, including collision-free operation, bounded deceleration, and acceptable safe stopping distance, under reasonable initial conditions. Numerical simulations validate the model's superior performance in achieving both safety and human-like braking profiles for the stationary lead-vehicle problem. Finally, we discuss the model's implications and future research directions.
comment: 27 pages, 4 figures
Provably safe and human-like car-following behaviors: Part 1. Analysis of phases and dynamics in standard models
Trajectory planning is essential for ensuring safe driving in the face of uncertainties related to communication, sensing, and dynamic factors such as weather, road conditions, policies, and other road users. Existing car-following models often lack rigorous safety proofs and the ability to replicate human-like driving behaviors consistently. This article applies multi-phase dynamical systems analysis to well-known car-following models to highlight the characteristics and limitations of existing approaches. We begin by formulating fundamental principles for safe and human-like car-following behaviors, which include zeroth-order principles for comfort and minimum jam spacings, first-order principles for speeds and time gaps, and second-order principles for comfort acceleration/deceleration bounds as well as braking profiles. From a set of these zeroth- and first-order principles, we derive Newell's simplified car-following model. Subsequently, we analyze phases within the speed-spacing plane for the stationary lead-vehicle problem in Newell's model and its extensions, which incorporate both bounded acceleration and deceleration. We then analyze the performance of the Intelligent Driver Model and the Gipps model. Through this analysis, we highlight the limitations of these models with respect to some of the aforementioned principles. Numerical simulations and empirical observations validate the theoretical insights. Finally, we discuss future research directions to further integrate safety, human-like behaviors, and vehicular automation in car-following models, which are addressed in Part 2 of this study \citep{jin2025WA20-02_Part2}, where we develop a novel multi-phase projection-based car-following model that addresses the limitations identified here.
comment: 29 pages, 7 figures
Learning Diverse Natural Behaviors for Enhancing the Agility of Quadrupedal Robots
Achieving animal-like agility is a longstanding goal in quadrupedal robotics. While recent studies have successfully demonstrated imitation of specific behaviors, enabling robots to replicate a broader range of natural behaviors in real-world environments remains an open challenge. Here we propose an integrated controller comprising a Basic Behavior Controller (BBC) and a Task-Specific Controller (TSC) which can effectively learn diverse natural quadrupedal behaviors in an enhanced simulator and efficiently transfer them to the real world. Specifically, the BBC is trained using a novel semi-supervised generative adversarial imitation learning algorithm to extract diverse behavioral styles from raw motion capture data of real dogs, enabling smooth behavior transitions by adjusting discrete and continuous latent variable inputs. The TSC, trained via privileged learning with depth images as input, coordinates the BBC to efficiently perform various tasks. Additionally, we employ evolutionary adversarial simulator identification to optimize the simulator, aligning it closely with reality. After training, the robot exhibits diverse natural behaviors, successfully completing the quadrupedal agility challenge at an average speed of 1.1 m/s and achieving a peak speed of 3.2 m/s during hurdling. This work represents a substantial step toward animal-like agility in quadrupedal robots, opening avenues for their deployment in increasingly complex real-world environments.
Hyper Yoshimura: How a slight tweak on a classical folding pattern unleashes meta-stability for deployable robots
Deployable structures inspired by origami offer lightweight, compact, and reconfigurable solutions for robotic and architectural applications. We present a geometric and mechanical framework for Yoshimura-Ori modules that supports a diverse set of metastable states, including newly identified asymmetric "pop-out" and "hyperfolded" configurations. These states are governed by three parameters -- tilt angle, phase shift, and slant height -- and enable discrete, programmable transformations. Using this model, we develop forward and inverse kinematic strategies to stack modules into deployable booms that approximate complex 3D shapes. We validate our approach through mechanical tests and demonstrate a tendon- and pneumatically-actuated Yoshimura Space Crane capable of object manipulation, solar tracking, and high load-bearing performance. A meter-scale solar charging station further illustrates the design's scalability. These results establish Yoshimura-Ori structures as a promising platform for adaptable, multifunctional deployable systems in both terrestrial and space environments.
Large-Scale Gaussian Splatting SLAM
The recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encouraging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in large-scale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM. The proposed LSG-SLAM employs a multi-modality strategy to estimate prior poses under large view changes. In tracking, we introduce feature-alignment warping constraints to alleviate the adverse effects of appearance similarity in rendering losses. For the scalability of large-scale scenarios, we introduce continuous Gaussian Splatting submaps to tackle unbounded scenes with limited memory. Loops are detected between GS submaps by place recognition and the relative pose between looped keyframes is optimized utilizing rendering and feature warping losses. After the global optimization of camera poses and Gaussian points, a structure refinement module enhances the reconstruction quality. With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches. Project page: https://lsg-slam.github.io.
Diffusion-SAFE: Shared Autonomy Framework with Diffusion for Safe Human-to-Robot Driving Handover
Safe handover in shared autonomy for vehicle control is well-established in modern vehicles. However, avoiding accidents often requires action several seconds in advance. This necessitates understanding human driver behavior and an expert control strategy for seamless intervention when a collision or unsafe state is predicted. We propose Diffusion-SAFE, a closed-loop shared autonomy framework leveraging diffusion models to: (1) predict human driving behavior for detection of potential risks, (2) generate safe expert trajectories, and (3) enable smooth handovers by blending human and expert policies over a short time horizon. Unlike prior works which use engineered score functions to rate driving performance, our approach enables both performance evaluation and optimal action sequence generation from demonstrations. By adjusting the forward and reverse processes of the diffusion-based copilot, our method ensures a gradual transition of control authority, by mimicking the drivers' behavior before intervention, which mitigates abrupt takeovers, leading to smooth transitions. We evaluated Diffusion-SAFE in both simulation (CarRacing-v0) and real-world (ROS-based race car), measuring human-driving similarity, safety, and computational efficiency. Results demonstrate a 98.5\% successful handover rate, highlighting the framework's effectiveness in progressively correcting human actions and continuously sampling optimal robot actions.
Unsupervised Radar Point Cloud Enhancement via Arbitrary LiDAR Guided Diffusion Prior
In industrial automation, radar is a critical sensor in machine perception. However, the angular resolution of radar is inherently limited by the Rayleigh criterion, which depends on both the radar's operating wavelength and the effective aperture of its antenna array.To overcome these hardware-imposed limitations, recent neural network-based methods have leveraged high-resolution LiDAR data, paired with radar measurements, during training to enhance radar point cloud resolution. While effective, these approaches require extensive paired datasets, which are costly to acquire and prone to calibration error. These challenges motivate the need for methods that can improve radar resolution without relying on paired high-resolution ground-truth data. Here, we introduce an unsupervised radar points enhancement algorithm that employs an arbitrary LiDAR-guided diffusion model as a prior without the need for paired training data. Specifically, our approach formulates radar angle estimation recovery as an inverse problem and incorporates prior knowledge through a diffusion model with arbitrary LiDAR domain knowledge. Experimental results demonstrate that our method attains high fidelity and low noise performance compared to traditional regularization techniques. Additionally, compared to paired training methods, it not only achieves comparable performance but also offers improved generalization capability. To our knowledge, this is the first approach that enhances radar points output by integrating prior knowledge via a diffusion model rather than relying on paired training data. Our code is available at https://github.com/yyxr75/RadarINV.
comment: 19 pages, 15 figures, 4 tables
Infinigen-Sim: Procedural Generation of Articulated Simulation Assets
We introduce Infinigen-Sim, a toolkit which enables users to create diverse and realistic articulated object procedural generators. These tools are composed of high-level utilities for use creating articulated assets in Blender, as well as an export pipeline to integrate the resulting assets into common robotics simulators. We demonstrate our system by creating procedural generators for 5 common articulated object categories. Experiments show that assets sampled from these generators are useful for movable object segmentation, training generalizable reinforcement learning policies, and sim-to-real transfer of imitation learning policies.
Embodied AI in Machine Learning -- is it Really Embodied?
Embodied Artificial Intelligence (Embodied AI) is gaining momentum in the machine learning communities with the goal of leveraging current progress in AI (deep learning, transformers, large language and visual-language models) to empower robots. In this chapter we put this work in the context of "Good Old-Fashioned Artificial Intelligence" (GOFAI) (Haugeland, 1989) and the behavior-based or embodied alternatives (R. A. Brooks 1991; Pfeifer and Scheier 2001). We claim that the AI-powered robots are only weakly embodied and inherit some of the problems of GOFAI. Moreover, we review and critically discuss the possibility of cross-embodiment learning (Padalkar et al. 2024). We identify fundamental roadblocks and propose directions on how to make progress.
comment: 16 pages, 3 figures
TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation
We present TartanGround, a large-scale, multi-modal dataset to advance the perception and autonomy of ground robots operating in diverse environments. This dataset, collected in various photorealistic simulation environments includes multiple RGB stereo cameras for 360-degree coverage, along with depth, optical flow, stereo disparity, LiDAR point clouds, ground truth poses, semantic segmented images, and occupancy maps with semantic labels. Data is collected using an integrated automatic pipeline, which generates trajectories mimicking the motion patterns of various ground robot platforms, including wheeled and legged robots. We collect 910 trajectories across 70 environments, resulting in 1.5 million samples. Evaluations on occupancy prediction and SLAM tasks reveal that state-of-the-art methods trained on existing datasets struggle to generalize across diverse scenes. TartanGround can serve as a testbed for training and evaluation of a broad range of learning-based tasks, including occupancy prediction, SLAM, neural scene representation, perception-based navigation, and more, enabling advancements in robotic perception and autonomy towards achieving robust models generalizable to more diverse scenarios. The dataset and codebase for data collection will be made publicly available upon acceptance. Webpage: https://tartanair.org/tartanground
comment: Under review for IEEE conference
Predicting Human Behavior in Autonomous Systems: A Collaborative Machine Teaching Approach for Reducing Transfer of Control Events
As autonomous systems become integral to various industries, effective strategies for fault handling are essential to ensure reliability and efficiency. Transfer of Control (ToC), a traditional approach for interrupting automated processes during faults, is often triggered unnecessarily in non-critical situations. To address this, we propose a data-driven method that uses human interaction data to train AI models capable of preemptively identifying and addressing issues or assisting users in resolution. Using an interactive tool simulating an industrial vacuum cleaner, we collected data and developed an LSTM-based model to predict user behavior. Our findings reveal that even data from non-experts can effectively train models to reduce unnecessary ToC events, enhancing the system's robustness. This approach highlights the potential of AI to learn directly from human problem-solving behaviors, complementing sensor data to improve industrial automation and human-AI collaboration.
Modular Robot Control with Motor Primitives
Despite a slow neuromuscular system, humans easily outperform modern robot technology, especially in physical contact tasks. How is this possible? Biological evidence indicates that motor control of biological systems is achieved by a modular organization of motor primitives, which are fundamental building blocks of motor behavior. Inspired by neuro-motor control research, the idea of using simpler building blocks has been successfully used in robotics. Nevertheless, a comprehensive formulation of modularity for robot control remains to be established. In this paper, we introduce a modular framework for robot control using motor primitives. We present two essential requirements to achieve modular robot control: independence of modules and closure of stability. We describe key control modules and demonstrate that a wide range of complex robotic behaviors can be generated from this small set of modules and their combinations. The presented modular control framework demonstrates several beneficial properties for robot control, including task-space control without solving Inverse Kinematics, addressing the problems of kinematic singularity and kinematic redundancy, and preserving passivity for contact and physical interactions. Further advantages include exploiting kinematic singularity to maintain high external load with low torque compensation, as well as controlling the robot beyond its end-effector, extending even to external objects. Both simulation and actual robot experiments are presented to validate the effectiveness of our modular framework. We conclude that modularity may be an effective constructive framework for achieving robotic behaviors comparable to human-level performance.
comment: 38 pages, 9 figures, Submitted to International Journal of Robotics Research for Review
Decision Making in Urban Traffic: A Game Theoretic Approach for Autonomous Vehicles Adhering to Traffic Rules
One of the primary challenges in urban autonomous vehicle decision-making and planning lies in effectively managing intricate interactions with diverse traffic participants characterized by unpredictable movement patterns. Additionally, interpreting and adhering to traffic regulations within rapidly evolving traffic scenarios pose significant hurdles. This paper proposed a rule-based autonomous vehicle decision-making and planning framework which extracts right-of-way from traffic rules to generate behavioural parameters, integrating them to effectively adhere to and navigate through traffic regulations. The framework considers the strong interaction between traffic participants mathematically by formulating the decision-making and planning problem into a differential game. By finding the Nash equilibrium of the problem, the autonomous vehicle is able to find optimal decisions. The proposed framework was tested under simulation as well as full-size vehicle platform, the results show that the ego vehicle is able to safely interact with surrounding traffic participants while adhering to traffic rules.
comment: This paper is already accepted on IEEE Transactions on Intelligent Transportation Systems
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. Our approach decouple the rendering process from the computation graph, enabling seamless integration with existing differentiable simulation ecosystems without the need for specialized differentiable rendering software. This decoupling not only reduces computational and memory overhead but also effectively attenuates the policy gradient norm, leading to more stable and smoother optimization. We evaluate our method on standard visual control benchmarks using modern GPU-accelerated simulation. Experiments show that our approach significantly reduces wall-clock training time and consistently outperforms all baseline methods in terms of final returns. Notably, on complex tasks such as humanoid locomotion, our method achieves a $4\times$ improvement in final return, and successfully learns a humanoid running policy within 4 hours on a single GPU.
Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments
This work extends our prior work on the distributed nonlinear model predictive control (NMPC) for navigating a robot fleet following a certain flocking behavior in unknown obstructed environments with a more realistic local obstacle avoidance strategy. More specifically, we integrate the local obstacle avoidance constraint using point clouds into the NMPC framework. Here, each agent relies on data from its local sensor to perceive and respond to nearby obstacles. A point cloud processing technique is presented for both two-dimensional and three-dimensional point clouds to minimize the computational burden during the optimization. The process consists of directional filtering and down-sampling that significantly reduce the number of data points. The algorithm's performance is validated through realistic 3D simulations in Gazebo, and its practical feasibility is further explored via hardware-in-the-loop (HIL) simulations on embedded platforms.
comment: 22 pages, 14 figures, to be published in Frontiers in Robotics and AI
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation
Large Language Models (LLMs) show growing promise in autonomous driving by reasoning over complex traffic scenarios to generate path plans. However, their tendencies toward overconfidence, and hallucinations raise critical safety concerns. We introduce SafePath, a modular framework that augments LLM-based path planning with formal safety guarantees using conformal prediction. SafePath operates in three stages. In the first stage, we use an LLM that generates a set of diverse candidate paths, exploring possible trajectories based on agent behaviors and environmental cues. In the second stage, SafePath filters out high-risk trajectories while guaranteeing that at least one safe option is included with a user-defined probability, through a multiple-choice question-answering formulation that integrates conformal prediction. In the final stage, our approach selects the path with the lowest expected collision risk when uncertainty is low or delegates control to a human when uncertainty is high. We theoretically prove that SafePath guarantees a safe trajectory with a user-defined probability, and we show how its human delegation rate can be tuned to balance autonomy and safety. Extensive experiments on nuScenes and Highway-env show that SafePath reduces planning uncertainty by 77\% and collision rates by up to 70\%, demonstrating effectiveness in making LLM-driven path planning more safer.
Embodied Intelligent Industrial Robotics: Concepts and Techniques
In recent years, embodied intelligent robotics (EIR) has made significant progress in multi-modal perception, autonomous decision-making, and physical interaction. Some robots have already been tested in general-purpose scenarios such as homes and shopping malls. We aim to advance the research and application of embodied intelligence in industrial scenes. However, current EIR lacks a deep understanding of industrial environment semantics and the normative constraints between industrial operating objects. To address this gap, this paper first reviews the history of industrial robotics and the mainstream EIR frameworks. We then introduce the concept of the embodied intelligent industrial robotics (EIIR) and propose a knowledge-driven EIIR technology framework for industrial environments. The framework includes four main modules: world model, high-level task planner, low-level skill controller, and simulator. We also review the current development of technologies related to each module and highlight recent progress in adapting them to industrial applications. Finally, we summarize the key challenges EIIR faces in industrial scenarios and suggest future research directions. We believe that EIIR technology will shape the next generation of industrial robotics. Industrial systems based on embodied intelligent industrial robots offer strong potential for enabling intelligent manufacturing. We will continue to track and summarize new research in this area and hope this review will serve as a valuable reference for scholars and engineers interested in industrial embodied intelligence. Together, we can help drive the rapid advancement and application of this technology. The associated project can be found at https://github.com/jackyzengl/EIIR.
comment: 60 pages, 11 figures. The associated project can be found at https://github.com/jackyzengl/EIIR
VGC-RIO: A Tightly Integrated Radar-Inertial Odometry with Spatial Weighted Doppler Velocity and Local Geometric Constrained RCS Histograms
Recent advances in 4D radar-inertial odometry have demonstrated promising potential for autonomous lo calization in adverse conditions. However, effective handling of sparse and noisy radar measurements remains a critical challenge. In this paper, we propose a radar-inertial odometry with a spatial weighting method that adapts to unevenly distributed points and a novel point-description histogram for challenging point registration. To make full use of the Doppler velocity from different spatial sections, we propose a weighting calculation model. To enhance the point cloud registration performance under challenging scenarios, we con struct a novel point histogram descriptor that combines local geometric features and radar cross-section (RCS) features. We have also conducted extensive experiments on both public and self-constructed datasets. The results demonstrate the precision and robustness of the proposed VGC-RIO.
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
Mimicry is a fundamental learning mechanism in humans, enabling individuals to learn new tasks by observing and imitating experts. However, applying this ability to robots presents significant challenges due to the inherent differences between human and robot embodiments in both their visual appearance and physical capabilities. While previous methods bridge this gap using cross-embodiment datasets with shared scenes and tasks, collecting such aligned data between humans and robots at scale is not trivial. In this paper, we propose UniSkill, a novel framework that learns embodiment-agnostic skill representations from large-scale cross-embodiment video data without any labels, enabling skills extracted from human video prompts to effectively transfer to robot policies trained only on robot data. Our experiments in both simulation and real-world environments show that our cross-embodiment skills successfully guide robots in selecting appropriate actions, even with unseen video prompts. The project website can be found at: https://kimhanjung.github.io/UniSkill.
comment: Project Page: https://kimhanjung.github.io/UniSkill/
Remote Manipulation of Multiple Objects with Airflow Field Using Model-Based Learning Control
Non-contact manipulation is a promising methodology in robotics, offering a wide range of scientific and industrial applications. Among the proposed approaches, airflow stands out for its ability to project across considerable distances and its flexibility in actuating objects of varying materials, sizes, and shapes. However, predicting airflow fields at a distance-and the motion of objects within them-remains notoriously challenging due to their nonlinear and stochastic nature. Here, we propose a model-based learning approach using a jet-induced airflow field for remote multi-object manipulation on a surface. Our approach incorporates an analytical model of the field, learned object dynamics, and a model-based controller. The model predicts an air velocity field over an infinite surface for a specified jet orientation, while the object dynamics are learned through a robust system identification algorithm. Using the model-based controller, we can automatically and remotely, at meter-scale distances, control the motion of single and multiple objects for different tasks, such as path-following, aggregating, and sorting.
comment: 8 pages, 7 figures
Latent Action Pretraining from Videos ICLR 2025
We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human teleoperators during pretraining, which significantly limits possible data sources and scale. In this work, we propose a method to learn from internet-scale videos that do not have robot action labels. We first train an action quantization model leveraging VQ-VAE-based objective to learn discrete latent actions between image frames, then pretrain a latent VLA model to predict these latent actions from observations and task descriptions, and finally finetune the VLA on small-scale robot manipulation data to map from latent to robot actions. Experimental results demonstrate that our method significantly outperforms existing techniques that train robot manipulation policies from large-scale videos. Furthermore, it outperforms the state-of-the-art VLA model trained with robotic action labels on real-world manipulation tasks that require language conditioning, generalization to unseen objects, and semantic generalization to unseen instructions. Training only on human manipulation videos also shows positive transfer, opening up the potential for leveraging web-scale data for robotics foundation model.
comment: ICLR 2025 Website: https://latentactionpretraining.github.io
LLM A*: Human in the Loop Large Language Models Enabled A* Search for Robotics
This research focuses on how Large Language Models (LLMs) can help with (path) planning for mobile embodied agents such as robots, in a human-in-the-loop and interactive manner. A novel framework named LLM A*, aims to leverage the commonsense of LLMs, and the utility-optimal A* is proposed to facilitate few-shot near-optimal path planning. Prompts are used for two main purposes: 1) to provide LLMs with essential information like environments, costs, heuristics, etc.; 2) to communicate human feedback on intermediate planning results to LLMs. This approach takes human feedback on board and renders the entire planning process transparent (akin to a `white box') to humans. Moreover, it facilitates code-free path planning, thereby fostering the accessibility and inclusiveness of artificial intelligence techniques to communities less proficient in coding. Comparative analysis against A* and RL demonstrates that LLM A* exhibits greater efficiency in terms of search space and achieves paths comparable to A* while outperforming RL. The interactive nature of LLM A* also makes it a promising tool for deployment in collaborative human-robot tasks. Codes and Supplemental Materials can be found at GitHub: https://github.com/speedhawk/LLM-A-.
comment: 7 figures, 8 pages
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a new framework for learning cross-embodiment vision-language-action (VLA) policies. Our key innovation is to derive task-centric action representations from videos with a latent action model. This enables us to exploit extensive data across a wide spectrum of embodiments and perspectives. To mitigate the effect of task-irrelevant dynamics, we incorporate language instructions and establish a latent action model within the DINO feature space. Learned from internet-scale videos, the generalist policy can be deployed to various robots through efficient latent action decoding. We obtain state-of-the-art results across multiple manipulation and navigation benchmarks, as well as real-robot deployments. UniVLA achieves superior performance over OpenVLA with less than 1/20 of pretraining compute and 1/10 of downstream data. Continuous performance improvements are observed as heterogeneous data, even including human videos, are incorporated into the training pipeline. The results underscore UniVLA's potential to facilitate scalable and efficient robot policy learning.
comment: Accepted to RSS 2025. Code is available at https://github.com/OpenDriveLab/UniVLA
Temporal Triplane Transformers as Occupancy World Models
World models aim to learn or construct representations of the environment that enable the prediction of future scenes, thereby supporting intelligent motion planning. However, existing models often struggle to produce fine-grained predictions and to operate in real time. In this work, we propose T$^3$Former, a novel 4D occupancy world model for autonomous driving. T$^3$Former begins by pre-training a compact {\em triplane} representation that efficiently encodes 3D occupancy. It then extracts multi-scale temporal motion features from historical triplanes and employs an autoregressive approach to iteratively predict future triplane changes. Finally, these triplane changes are combined with previous states to decode future occupancy and ego-motion trajectories. Experimental results show that T$^3$Former achieves 1.44$\times$ speedup (26 FPS), improves mean IoU to 36.09, and reduces mean absolute planning error to 1.0 meters. Demos are available in the supplementary material.
Multi-layer Motion Planning with Kinodynamic and Spatio-Temporal Constraints SC
We propose a novel, multi-layered planning approach for computing paths that satisfy both kinodynamic and spatiotemporal constraints. Our three-part framework first establishes potential sequences to meet spatial constraints, using them to calculate a geometric lead path. This path then guides an asymptotically optimal sampling-based kinodynamic planner, which minimizes an STL-robustness cost to jointly satisfy spatiotemporal and kinodynamic constraints. In our experiments, we test our method with a velocity-controlled Ackerman-car model and demonstrate significant efficiency gains compared to prior art. Additionally, our method is able to generate complex path maneuvers, such as crossovers, something that previous methods had not demonstrated.
comment: Accepted to ACM Hybrid Systems: Computation and Control (HSCC) 2025
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments
Developing autonomous home robots controlled by natural language has long been a pursuit of humanity. While advancements in large language models (LLMs) and embodied intelligence make this goal closer, several challenges persist: the lack of a unified benchmark for more complex robot tasks, limited evaluation methods and metrics, data incompatibility between LLMs and mobile manipulation trajectories. To address these issues, we propose Embodied Mobile Manipulation in Open Environments (EMMOE), a benchmark that requires agents to interpret user instructions and execute long-horizon everyday tasks in continuous space. EMMOE seamlessly integrates high-level and low-level embodied tasks into a unified framework, along with three new metrics for more diverse assessment. Additionally, we collect~\dataset, which features in various task attributes, detailed process annotations, re-plans after failures, and two sub-datasets for LLM training. Furthermore, we design~\model, a sophisticated agent system consists of LLM with Direct Preference Optimization (DPO), light weighted navigation and manipulation models, and multiple error detection mechanisms. Finally, we demonstrate~\model's performance and evaluations of different models and policies.
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
Human videos offer a scalable way to train robot manipulation policies, but lack the action labels needed by standard imitation learning algorithms. Existing cross-embodiment approaches try to map human motion to robot actions, but often fail when the embodiments differ significantly. We propose X-Sim, a real-to-sim-to-real framework that uses object motion as a dense and transferable signal for learning robot policies. X-Sim starts by reconstructing a photorealistic simulation from an RGBD human video and tracking object trajectories to define object-centric rewards. These rewards are used to train a reinforcement learning (RL) policy in simulation. The learned policy is then distilled into an image-conditioned diffusion policy using synthetic rollouts rendered with varied viewpoints and lighting. To transfer to the real world, X-Sim introduces an online domain adaptation technique that aligns real and simulated observations during deployment. Importantly, X-Sim does not require any robot teleoperation data. We evaluate it across 5 manipulation tasks in 2 environments and show that it: (1) improves task progress by 30% on average over hand-tracking and sim-to-real baselines, (2) matches behavior cloning with 10x less data collection time, and (3) generalizes to new camera viewpoints and test-time changes. Code and videos are available at https://portal-cornell.github.io/X-Sim/.
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in diverse real-world environments. The key ingredient of NavDP's network is the combination of diffusion-based trajectory generation and a critic function for trajectory selection, which are conditioned on only local observation tokens encoded from a shared policy transformer. Given the privileged information of the global environment in simulation, we scale up the demonstrations of good quality to train the diffusion policy and formulate the critic value function targets with contrastive negative samples. Our demonstration generation approach achieves about 2,500 trajectories/GPU per day, 20$\times$ more efficient than real-world data collection, and results in a large-scale navigation dataset with 363.2km trajectories across 1244 scenes. Trained with this simulation dataset, NavDP achieves state-of-the-art performance and consistently outstanding generalization capability on quadruped, wheeled, and humanoid robots in diverse indoor and outdoor environments. In addition, we present a preliminary attempt at using Gaussian Splatting to make in-domain real-to-sim fine-tuning to further bridge the sim-to-real gap. Experiments show that adding such real-to-sim data can improve the success rate by 30\% without hurting its generalization capability.
comment: Project Page: https://wzcai99.github.io/navigation-diffusion-policy.github.io/
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments
We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model's (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 20.81% in satisfying traversability constraints and 28.51% in terms of human-like navigation in four different outdoor navigation scenarios.
Distortion Bounds of Subdivision Models for SO(3)
In the subdivision approach to robot path planning, we need to subdivide the configuration space of a robot into nice cells to perform various computations. For a rigid spatial robot, this configuration space is $SE(3)=\mathbb{R}^3\times SO(3)$. The subdivision of $\mathbb{R}^3$ is standard but so far, there are no global subdivision schemes for $SO(3)$. We recently introduced a representation for $SO(3)$ suitable for subdivision. This paper investigates the distortion of the natural metric on $SO(3)$ caused by our representation. The proper framework for this study lies in the Riemannian geometry of $SO(3)$, enabling us to obtain sharp distortion bounds.
comment: 10 pages, 1 figure. Submitted to 3rd IMA Robotics Conferences, 2025
Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning
Imitation Learning can train robots to perform complex and diverse manipulation tasks, but learned policies are brittle with observations outside of the training distribution. 3D scene representations that incorporate observations from calibrated RGBD cameras have been proposed as a way to mitigate this, but in our evaluations with unseen embodiments and camera viewpoints they show only modest improvement. To address those challenges, we propose Adapt3R, a general-purpose 3D observation encoder which synthesizes data from calibrated RGBD cameras into a vector that can be used as conditioning for arbitrary IL algorithms. The key idea is to use a pretrained 2D backbone to extract semantic information, using 3D only as a medium to localize this information with respect to the end-effector. We show across 93 simulated and 6 real tasks that when trained end-to-end with a variety of IL algorithms, Adapt3R maintains these algorithms' learning capacity while enabling zero-shot transfer to novel embodiments and camera poses.
comment: Videos, code, and data: https://pairlab.github.io/Adapt3R
MRNaB: Mixed Reality-based Robot Navigation Interface using Optical-see-through MR-beacons
Recent advancements in robotics have led to the development of numerous interfaces to enhance the intuitiveness of robot navigation. However, the reliance on traditional 2D displays imposes limitations on the simultaneous visualization of information. Mixed Reality (MR) technology addresses this issue by enhancing the dimensionality of information visualization, allowing users to perceive multiple pieces of information concurrently. This paper proposes the Mixed Reality-based Robot Navigation Interface using an Optical-see-through MR-beacons (MRNaB), a novel approach that uses MR-beacons created with an ``air tap'', situated in the real world. This beacon is persistent, enabling multi-destination visualization and functioning as a signal transmitter for robot navigation, eliminating the need for repeated navigation inputs. Our system is mainly constructed into four primary functions: ``Add'', ``Move'', ``Delete'', and ``Select''. These allow for the addition of MR-beacons, location movement, its deletion, and the selection of MR-beacons for navigation purposes, respectively. To validate the effectiveness, we conducted comprehensive experiments comparing MRNaB with traditional 2D navigation systems. The results show significant improvements in user performance, both objectively and subjectively, confirming that the MRNaB enhances navigation efficiency and user experience. For additional material, please check: https://mertcookimg.github.io/mrnab
Multiagent Systems
Multi-Agent Path Finding For Large Agents Is Intractable
The multi-agent path finding (MAPF) problem asks to find a set of paths on a graph such that when synchronously following these paths the agents never encounter a conflict. In the most widespread MAPF formulation, the so-called Classical MAPF, the agents sizes are neglected and two types of conflicts are considered: occupying the same vertex or using the same edge at the same time step. Meanwhile in numerous practical applications, e.g. in robotics, taking into account the agents' sizes is vital to ensure that the MAPF solutions can be safely executed. Introducing large agents yields an additional type of conflict arising when one agent follows an edge and its body overlaps with the body of another agent that is actually not using this same edge (e.g. staying still at some distinct vertex of the graph). Until now it was not clear how harder the problem gets when such conflicts are to be considered while planning. Specifically, it was known that Classical MAPF problem on an undirected graph can be solved in polynomial time, however no complete polynomial-time algorithm was presented to solve MAPF with large agents. In this paper we, for the first time, establish that the latter problem is NP-hard and, thus, if P!=NP no polynomial algorithm for it can, unfortunately, be presented. Our proof is based on the prevalent in the field technique of reducing the seminal 3SAT problem (which is known to be an NP-complete problem) to the problem at hand. In particular, for an arbitrary 3SAT formula we procedurally construct a dedicated graph with specific start and goal vertices and show that the given 3SAT formula is satisfiable iff the corresponding path finding instance has a solution.
pc-dbCBS: Kinodynamic Motion Planning of Physically-Coupled Robot Teams
Motion planning problems for physically-coupled multi-robot systems in cluttered environments are challenging due to their high dimensionality. Existing methods combining sampling-based planners with trajectory optimization produce suboptimal results and lack theoretical guarantees. We propose Physically-coupled discontinuity-bounded Conflict-Based Search (pc-dbCBS), an anytime kinodynamic motion planner, that extends discontinuity-bounded CBS to rigidly-coupled systems. Our approach proposes a tri-level conflict detection and resolution framework that includes the physical coupling between the robots. Moreover, pc-dbCBS alternates iteratively between state space representations, thereby preserving probabilistic completeness and asymptotic optimality while relying only on single-robot motion primitives. Across 25 simulated and six real-world problems involving multirotors carrying a cable-suspended payload and differential-drive robots linked by rigid rods, pc-dbCBS solves up to 92% more instances than a state-of-the-art baseline and plans trajectories that are 50-60% faster while reducing planning time by an order of magnitude.
comment: This work has been submitted to the IEEE for possible publication
Near Optimal Best Arm Identification for Clustered Bandits ICML 2025
This work investigates the problem of best arm identification for multi-agent multi-armed bandits. We consider $N$ agents grouped into $M$ clusters, where each cluster solves a stochastic bandit problem. The mapping between agents and bandits is a priori unknown. Each bandit is associated with $K$ arms, and the goal is to identify the best arm for each agent under a $\delta$-probably correct ($\delta$-PC) framework, while minimizing sample complexity and communication overhead. We propose two novel algorithms: Clustering then Best Arm Identification (Cl-BAI) and Best Arm Identification then Clustering (BAI-Cl). Cl-BAI uses a two-phase approach that first clusters agents based on the bandit problems they are learning, followed by identifying the best arm for each cluster. BAI-Cl reverses the sequence by identifying the best arms first and then clustering agents accordingly. Both algorithms leverage the successive elimination framework to ensure computational efficiency and high accuracy. We establish $\delta$-PC guarantees for both methods, derive bounds on their sample complexity, and provide a lower bound for this problem class. Moreover, when $M$ is small (a constant), we show that the sample complexity of a variant of BAI-Cl is minimax optimal in an order-wise sense. Experiments on synthetic and real-world datasets (MovieLens, Yelp) demonstrate the superior performance of the proposed algorithms in terms of sample and communication efficiency, particularly in settings where $M \ll N$.
comment: To be published in ICML 2025
CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation
The rapid development of generative artificial intelligence (GenAI) presents new opportunities to advance the cartographic process. Previous studies have either overlooked the artistic aspects of maps or faced challenges in creating both accurate and informative maps. In this study, we propose CartoAgent, a novel multi-agent cartographic framework powered by multimodal large language models (MLLMs). This framework simulates three key stages in cartographic practice: preparation, map design, and evaluation. At each stage, different MLLMs act as agents with distinct roles to collaborate, discuss, and utilize tools for specific purposes. In particular, CartoAgent leverages MLLMs' visual aesthetic capability and world knowledge to generate maps that are both visually appealing and informative. By separating style from geographic data, it can focus on designing stylesheets without modifying the vector-based data, thereby ensuring geographic accuracy. We applied CartoAgent to a specific task centered on map restyling-namely, map style transfer and evaluation. The effectiveness of this framework was validated through extensive experiments and a human evaluation study. CartoAgent can be extended to support a variety of cartographic design decisions and inform future integrations of GenAI in cartography.
comment: 57 pages, 17 figures
Demystifying AI Agents: The Final Generation of Intelligence
The trajectory of artificial intelligence (AI) has been one of relentless acceleration, evolving from rudimentary rule-based systems to sophisticated, autonomous agents capable of complex reasoning and interaction. This whitepaper chronicles this remarkable journey, charting the key technological milestones--advancements in prompting, training methodologies, hardware capabilities, and architectural innovations--that have converged to create the AI agents of today. We argue that these agents, exemplified by systems like OpenAI's ChatGPT with plugins and xAI's Grok, represent a culminating phase in AI development, potentially constituting the "final generation" of intelligence as we currently conceive it. We explore the capabilities and underlying technologies of these agents, grounded in practical examples, while also examining the profound societal implications and the unprecedented pace of progress that suggests intelligence is now doubling approximately every six months. The paper concludes by underscoring the critical need for wisdom and foresight in navigating the opportunities and challenges presented by this powerful new era of intelligence.
Decision Making in Urban Traffic: A Game Theoretic Approach for Autonomous Vehicles Adhering to Traffic Rules
One of the primary challenges in urban autonomous vehicle decision-making and planning lies in effectively managing intricate interactions with diverse traffic participants characterized by unpredictable movement patterns. Additionally, interpreting and adhering to traffic regulations within rapidly evolving traffic scenarios pose significant hurdles. This paper proposed a rule-based autonomous vehicle decision-making and planning framework which extracts right-of-way from traffic rules to generate behavioural parameters, integrating them to effectively adhere to and navigate through traffic regulations. The framework considers the strong interaction between traffic participants mathematically by formulating the decision-making and planning problem into a differential game. By finding the Nash equilibrium of the problem, the autonomous vehicle is able to find optimal decisions. The proposed framework was tested under simulation as well as full-size vehicle platform, the results show that the ego vehicle is able to safely interact with surrounding traffic participants while adhering to traffic rules.
comment: This paper is already accepted on IEEE Transactions on Intelligent Transportation Systems
Agent Name Service (ANS): A Universal Directory for Secure AI Agent Discovery and Interoperability SP
The proliferation of AI agents requires robust mechanisms for secure discovery. This paper introduces the Agent Name Service (ANS), a novel architecture based on DNS addressing the lack of a public agent discovery framework. ANS provides a protocol-agnostic registry infrastructure that leverages Public Key Infrastructure (PKI) certificates for verifiable agent identity and trust. The architecture features several key innovations: a formalized agent registration and renewal mechanism for lifecycle management; DNS-inspired naming conventions with capability-aware resolution; a modular Protocol Adapter Layer supporting diverse communication standards (A2A, MCP, ACP etc.); and precisely defined algorithms for secure resolution. We implement structured communication using JSON Schema and conduct a comprehensive threat analysis of our proposal. The result is a foundational directory service addressing the core challenges of secured discovery and interaction in multi-agent systems, paving the way for future interoperable, trustworthy, and scalable agent ecosystems.
comment: 15 pages, 6 figures, 6 code listings, Supported and endorsed by OWASP GenAI ASI Project
Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks
Multi-agent systems built on large language models (LLMs) promise enhanced problem-solving through distributed information integration, but also risk replicating collective reasoning failures observed in human groups. Yet, no theory-grounded benchmark exists to systematically evaluate such failures. In this paper, we introduce the Hidden Profile paradigm from social psychology as a diagnostic testbed for multi-agent LLM systems. By distributing critical information asymmetrically across agents, the paradigm reveals how inter-agent dynamics support or hinder collective reasoning. We first formalize the paradigm for multi-agent decision-making under distributed knowledge and instantiate it as a benchmark with nine tasks spanning diverse scenarios, including adaptations from prior human studies. We then conduct experiments with GPT-4.1 and five other leading LLMs, including reasoning-enhanced variants, showing that multi-agent systems across all models fail to match the accuracy of single agents given complete information. While agents' collective performance is broadly comparable to that of human groups, nuanced behavioral differences emerge, such as increased sensitivity to social desirability. Finally, we demonstrate the paradigm's diagnostic utility by exploring a cooperation-contradiction trade-off in multi-agent LLM systems. We find that while cooperative agents are prone to over-coordination in collective settings, increased contradiction impairs group convergence. This work contributes a reproducible framework for evaluating multi-agent LLM systems and motivates future research on artificial collective intelligence and human-AI interaction.
Learning Graph Representation of Agent Diffusers AAMAS2025
Diffusion-based generative models have significantly advanced text-to-image synthesis, demonstrating impressive text comprehension and zero-shot generalization. These models refine images from random noise based on textual prompts, with initial reliance on text input shifting towards enhanced visual fidelity over time. This transition suggests that static model parameters might not optimally address the distinct phases of generation. We introduce LGR-AD (Learning Graph Representation of Agent Diffusers), a novel multi-agent system designed to improve adaptability in dynamic computer vision tasks. LGR-AD models the generation process as a distributed system of interacting agents, each representing an expert sub-model. These agents dynamically adapt to varying conditions and collaborate through a graph neural network that encodes their relationships and performance metrics. Our approach employs a coordination mechanism based on top-$k$ maximum spanning trees, optimizing the generation process. Each agent's decision-making is guided by a meta-model that minimizes a novel loss function, balancing accuracy and diversity. Theoretical analysis and extensive empirical evaluations show that LGR-AD outperforms traditional diffusion models across various benchmarks, highlighting its potential for scalable and flexible solutions in complex image generation tasks. Code is available at: https://github.com/YousIA/LGR_AD
comment: Accepted at AAMAS2025 International Conference on Autonomous Agents and Multiagent Systems
Learning Progress Driven Multi-Agent Curriculum ICML 2025
The number of agents can be an effective curriculum variable for controlling the difficulty of multi-agent reinforcement learning (MARL) tasks. Existing work typically uses manually defined curricula such as linear schemes. We identify two potential flaws while applying existing reward-based automatic curriculum learning methods in MARL: (1) The expected episode return used to measure task difficulty has high variance; (2) Credit assignment difficulty can be exacerbated in tasks where increasing the number of agents yields higher returns which is common in many MARL tasks. To address these issues, we propose to control the curriculum by using a TD-error based *learning progress* measure and by letting the curriculum proceed from an initial context distribution to the final task specific one. Since our approach maintains a distribution over the number of agents and measures learning progress rather than absolute performance, which often increases with the number of agents, we alleviate problem (2). Moreover, the learning progress measure naturally alleviates problem (1) by aggregating returns. In three challenging sparse-reward MARL benchmarks, our approach outperforms state-of-the-art baselines.
comment: ICML 2025
Systems and Control (CS)
Can On Body Sensing Be Spatial Adaptive?
Wearable sensors are typically affixed to specific locations on the human body, and their position remains static, only changing unintentionally due to motion artifacts. This static configuration introduces significant limitations. As a result, current systems miss the opportunity to capture dynamic physiological data from diverse body regions. This research investigates the potential of developing movable sensors that adaptively reposition themselves to sample different areas of interest on the body, addressing gaps in spatial coverage. We designed, developed, and fabricated a 3 x 3 matrix platform to support moving sensors from one location to another. We validated the feasibility through simulations on a matrix of up to 9 x 9 locations with up to 16 concurrent sensors and real-world prototype characterization.
Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Gas turbine engines represent complex highly nonlinear dynamical systems. Deriving their physics-based models can be challenging as it requires performance characteristics, that are not always available, and one often has to make many simplifying assumptions. In this paper, the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models are discussed and addressed by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics were estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics was mapped into an optimally constructed Koopman eigenfunction space. The process included eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model was validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator were then designed in the eigenfunction space and compared to the classical and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure allowed targeting individual modes during the optimization process, resulting in a better performance tuning. The results showed that the Koopman-based controller outperformed the other benchmark controllers in both reference tracking and disturbance rejection, under sea-level and varying flight conditions, due to its global nature.
comment: 51 pages, 28 figures
AutoCam: Hierarchical Path Planning for an Autonomous Auxiliary Camera in Surgical Robotics
Incorporating an autonomous auxiliary camera into robot-assisted minimally invasive surgery (RAMIS) enhances spatial awareness and eliminates manual viewpoint control. Existing path planning methods for auxiliary cameras track two-dimensional surgical features but do not simultaneously account for camera orientation, workspace constraints, and robot joint limits. This study presents AutoCam: an automatic auxiliary camera placement method to improve visualization in RAMIS. Implemented on the da Vinci Research Kit, the system uses a priority-based, workspace-constrained control algorithm that combines heuristic geometric placement with nonlinear optimization to ensure robust camera tracking. A user study (N=6) demonstrated that the system maintained 99.84% visibility of a salient feature and achieved a pose error of 4.36 $\pm$ 2.11 degrees and 1.95 $\pm$ 5.66 mm. The controller was computationally efficient, with a loop time of 6.8 $\pm$ 12.8 ms. An additional pilot study (N=6), where novices completed a Fundamentals of Laparoscopic Surgery training task, suggests that users can teleoperate just as effectively from AutoCam's viewpoint as from the endoscope's while still benefiting from AutoCam's improved visual coverage of the scene. These results indicate that an auxiliary camera can be autonomously controlled using the da Vinci patient-side manipulators to track a salient feature, laying the groundwork for new multi-camera visualization methods in RAMIS.
comment: 13 pages, 9 figures
Unlocking Innate Computing Abilities in Electric Grids
High energy consumption of artificial intelligence has gained momentum worldwide, which necessitates major investments on expanding efficient and carbon-neutral generation and data center infrastructure in electric power grids. Going beyond the conventional ideation, this article unleashes innate computational abilities in the power grid network circuits itself. By programming power electronic converters (PECs) to mimic biological neurons, we sustainably transform power grids into a neural network and enable it to optimize, compute and make data-driven decisions using distributed PECs. Instead of seen merely as an energy delivery platform, this article conceptualizes a novel application for electric grid to be used as a computing asset without affecting its operation. To illustrate its computational abilities, we solve a affine transformation task in a microgrid with five PECs. By encoding the digital data into the control of PECs, our preliminary results conclude that computing using electric grids does not disturb its operation. From a scientific perspective, this work fundamentally merges energy and computing optimization theories by harnessing inherent high-dimensional computational relationships in electric grids.
Spatially Selective Active Noise Control for Open-fitting Hearables with Acausal Optimization
Recent advances in active noise control have enabled the development of hearables with spatial selectivity, which actively suppress undesired noise while preserving desired sound from specific directions. In this work, we propose an improved approach to spatially selective active noise control that incorporates acausal relative impulse responses into the optimization process, resulting in significantly improved performance over the causal design. We evaluate the system through simulations using a pair of open-fitting hearables with spatially localized speech and noise sources in an anechoic environment. Performance is evaluated in terms of speech distortion, noise reduction, and signal-to-noise ratio improvement across different delays and degrees of acausality. Results show that the proposed acausal optimization consistently outperforms the causal approach across all metrics and scenarios, as acausal filters more effectively characterize the response of the desired source.
comment: Forum Acusticum/Euronoise 2025
A Hybrid Strategy for Aggregated Probabilistic Forecasting and Energy Trading in HEFTCom2024
Obtaining accurate probabilistic energy forecasts and making effective decisions amid diverse uncertainties are routine challenges in future energy systems. This paper presents the solution of team GEB, which ranked 3rd in trading, 4th in forecasting, and 1st among student teams in the IEEE Hybrid Energy Forecasting and Trading Competition 2024 (HEFTCom2024). The solution provides accurate probabilistic forecasts for a wind-solar hybrid system, and achieves substantial trading revenue in the day-ahead electricity market. Key components include: (1) a stacking-based approach combining sister forecasts from various Numerical Weather Predictions (NWPs) to provide wind power forecasts, (2) an online solar post-processing model to address the distribution shift in the online test set caused by increased solar capacity, (3) a probabilistic aggregation method for accurate quantile forecasts of hybrid generation, and (4) a stochastic trading strategy to maximize expected trading revenue considering uncertainties in electricity prices. This paper also explores the potential of end-to-end learning to further enhance the trading revenue by adjusting the distribution of forecast errors. Detailed case studies are provided to validate the effectiveness of these proposed methods. Code for all mentioned methods is available for reproduction and further research in both industry and academia.
comment: Solution description of IEEE Hybrid Energy Forecasting and Trading Competition (HEFTCom)
pc-dbCBS: Kinodynamic Motion Planning of Physically-Coupled Robot Teams
Motion planning problems for physically-coupled multi-robot systems in cluttered environments are challenging due to their high dimensionality. Existing methods combining sampling-based planners with trajectory optimization produce suboptimal results and lack theoretical guarantees. We propose Physically-coupled discontinuity-bounded Conflict-Based Search (pc-dbCBS), an anytime kinodynamic motion planner, that extends discontinuity-bounded CBS to rigidly-coupled systems. Our approach proposes a tri-level conflict detection and resolution framework that includes the physical coupling between the robots. Moreover, pc-dbCBS alternates iteratively between state space representations, thereby preserving probabilistic completeness and asymptotic optimality while relying only on single-robot motion primitives. Across 25 simulated and six real-world problems involving multirotors carrying a cable-suspended payload and differential-drive robots linked by rigid rods, pc-dbCBS solves up to 92% more instances than a state-of-the-art baseline and plans trajectories that are 50-60% faster while reducing planning time by an order of magnitude.
comment: This work has been submitted to the IEEE for possible publication
A Comparative Study of SMT and MILP for the Nurse Rostering Problem
The effects of personnel scheduling on the quality of care and working conditions for healthcare personnel have been thoroughly documented. However, the ever-present demand and large variation of constraints make healthcare scheduling particularly challenging. This problem has been studied for decades, with limited research aimed at applying Satisfiability Modulo Theories (SMT). SMT has gained momentum within the formal verification community in the last decades, leading to the advancement of SMT solvers that have been shown to outperform standard mathematical programming techniques. In this work, we propose generic constraint formulations that can model a wide range of real-world scheduling constraints. Then, the generic constraints are formulated as SMT and MILP problems and used to compare the respective state-of-the-art solvers, Z3 and Gurobi, on academic and real-world inspired rostering problems. Experimental results show how each solver excels for certain types of problems; the MILP solver generally performs better when the problem is highly constrained or infeasible, while the SMT solver performs better otherwise. On real-world inspired problems containing a more varied set of shifts and personnel, the SMT solver excels. Additionally, it was noted during experimentation that the SMT solver was more sensitive to the way the generic constraints were formulated, requiring careful consideration and experimentation to achieve better performance. We conclude that SMT-based methods present a promising avenue for future research within the domain of personnel scheduling.
comment: 6 pages, 3 figures
Country wide Shared FibreBased Infrastructure for Dissemination of Precise Time, Coherent Optical Frequency with Vibration Sensing
With the increasing demand for ultra-precise time synchronization and frequency dissemination across various scientific, industrial, and communication fields, the Czech Republic has developed an innovative, non-commercial fiber-based infrastructure. This infrastructure serves as a shared platform, utilizing optical fibers to enable high-precision timing, coherent frequency transfer, and a newly implemented vibrational sensing capability. The project also addresses challenges posed by classical communication noise-particularly from Raman scattering-on quantum channels, especially for Quantum Key Distribution (QKD). By strategically separating classical and quantum channels into distinct wavelength bands, such as the C-band and O-band, the infrastructure achieves minimal interference while enabling multiple concurrent applications over shared fiber lines.
comment: 4 pages, 2 figures, conference paper
Self Clocked Digital LDO for Cryogenic Power Management in 22nm FDSOI with 98 Percent Efficiency
A universal quantum computer~(QC), though promising ground breaking solutions to complex problems, still faces several challenges with respect to scalability. Current state-of-the-art QC use a great quantity of cables to connect the physical qubits, situated in the cryogenic temperature, to room temperature electronics. Integrated cryogenic electronics together with semiconductor spin qubits is one way closer for scalability. Such a scalable quantum computer can have qubits and the control electronics at 4K stage. Being at 4K, more thermal dissipation is allowed without overloading the cooling capability of the fridge. Still, control and power circuitry is expected to be highly efficient. While commercial CMOS technologies are found to be operatable at \qty{}{mK}, lack of reliable cryogenic models while designing, increased mismatches at cryo temperatures makes the design challenging and risky. Using an FDSOI technology with backgate biasing to compensate for the threshold voltage drift happening at cryo~(compensating around 200mV) and digital circuitry is a way to address this challenge. In this work, a self-clocked digital low dropout regulator (DLDO) is designed in FDSOI for high power efficient, variation tolerant regulator to supply cryogenic circuits for Quantum computing. The proposed digital LDO is more resilient to mismatch and having self clocking and close and fine loops addresses the power efficiency and faster transient response.
Angle diversity receiver as a key enabler for reliable ORIS-based Visible Light Communication
Visible Light Communication (VLC) offers a promising solution to satisfy the increasing demand for wireless data. However, link blockages remain a significant challenge. This paper addresses this issue by investigating the combined use of angle diversity receivers (ADRs) and optical reconfigurable intelligent surfaces (ORISs) in multiuser VLC systems. We consider ORIS elements as small movable mirrors. We demonstrate the complementarity of ADR and ORIS in mitigating link blockages, as well as the advantages of using a larger number of ORIS elements due to the increased field-of-view (FoV) at the receiver enabled by the ADR. An optimization algorithm is proposed to maximize the minimum signal-to-noise power ratio (SNR) to deploy a fair communication network. Numerical results show that integrating ADR and ORIS significantly enhances VLC communication performance, achieving an SNR gain of up to 30 dB compared to a system without ORIS, and mitigating communication outages produced by link blockages or out-of-FoV received signals. We also prove that an ADR with a single tier of photodiodes is sufficient to complement ORIS-assisted VLC.
Discontinuous integro-differential control systems with sliding modes
The paper deals with analysis and design sliding mode control systems modeled by integro-differential equations. Filippov method and equivalent control approach are extended to a class of nonlinear discontinuous integro-differential equations. Sliding mode control algorithm is designed for a control system with distributed input delay. The obtained results are illustrated by numerical example.
DB InfraGO's Automated Dispatching Assistant ADA-PMB
As railway infrastructure manager, DB InfraGO AG is faced with the challenge of offering fluid and punctual operation despite rising demand and increased construction activity. The high capacity utilisation, especially in the core network sections, causes delays to be propagated quickly and widely across the entire network. Up to now, conflicts between train runs can be identified automatically, but dispatching measures have been based on past human experience. An automated dispatching assistance system is currently being piloted to provide support for train dispatchers in their work. The aim is to offer them helpful dispatching recommendations, particularly in stressful situations with a high conflict density in the network section under consideration, in order to ensure the most efficient operation of the system. The recommendations are currently displayed separately alongside the central control system. In future, they will be integrated into the central control system, which will significantly simplify communication between the train dispatcher and signal setter. Further development steps for the integration process are also presented and discussed.
Improving Power Systems Controllability via Edge Centrality Measures
Improving the controllability of power networks is crucial as they are highly complex networks operating in synchrony; even minor perturbations can cause desynchronization and instability. To that end, one needs to assess the criticality of key network components (buses and lines) in terms of their impact on system performance. Traditional methods to identify the key nodes/edges in power networks often rely on static centrality measures based on the network's topological structure ignoring the network's dynamic behavior. In this paper, using multi-machine power network models and a new control-theoretic edge centrality matrix (ECM) approach, we: (i) quantify the influence of edges (i.e., the line susceptances) in terms of controllability performance metrics, (ii) identify the most influential lines, and (iii) compute near-optimal edge modifications that improve the power network controllability. Employing various IEEE power network benchmarks, we validate the effectiveness of the ECM-based algorithm and demonstrate improvements in system reachability, control, and damping performance.
Planar Herding of Multiple Evaders with a Single Herder
A planar herding problem is considered, where a superior pursuer herds a flock of non-cooperative, inferior evaders around a predefined target point. An inverse square law of repulsion is assumed between the pursuer and each evader. Two classes of pursuer trajectories are proposed: (i) a constant angular-velocity spiral, and (ii) a constant angular-velocity circle, both centered around the target point. For the spiraling pursuer, the radial velocity is dynamically adjusted based on a feedback law that depends on the instantaneous position of the evader, which is located at the farthest distance from the target at the start of the game. It is shown that, under suitable choices of the model parameters, all the evaders are herded into an arbitrarily small limit cycle around the target point. Meanwhile, the pursuer also converges onto a circular trajectory around the target. The conditions for the stability of these limit cycles are derived. For the circling pursuer, similar guarantees are provided along with explicit formulas for the radii of the limit cycles.
comment: 12 pages, 16 figures
Fast Heuristic Scheduling and Trajectory Planning for Robotic Fruit Harvesters with Multiple Cartesian Arms
This work proposes a fast heuristic algorithm for the coupled scheduling and trajectory planning of multiple Cartesian robotic arms harvesting fruits. Our method partitions the workspace, assigns fruit-picking sequences to arms, determines tight and feasible fruit-picking schedules and vehicle travel speed, and generates smooth, collision-free arm trajectories. The fruit-picking throughput achieved by the algorithm was assessed using synthetically generated fruit coordinates and a harvester design featuring up to 12 arms. The throughput increased monotonically as more arms were added. Adding more arms when fruit densities were low resulted in diminishing gains because it took longer to travel from one fruit to another. However, when there were enough fruits, the proposed algorithm achieved a linear speedup as the number of arms increased.
comment: This work will be submitted to the IEEE for possible publication
Threshold Strategy for Leaking Corner-Free Hamilton-Jacobi Reachability with Decomposed Computations
Hamilton-Jacobi (HJ) Reachability is widely used to compute value functions for states satisfying specific control objectives. However, it becomes intractable for high-dimensional problems due to the curse of dimensionality. Dimensionality reduction approaches are essential for mitigating this challenge, whereas they could introduce the ``leaking corner issue", leading to inaccuracies in the results. In this paper, we define the ``leaking corner issue" in terms of value functions, propose and prove a necessary condition for its occurrence. We then use these theoretical contributions to introduce a new local updating method that efficiently corrects inaccurate value functions while maintaining the computational efficiency of the dimensionality reduction approaches. We demonstrate the effectiveness of our method through numerical simulations. Although we validate our method with the self-contained subsystem decomposition (SCSD), our approach is applicable to other dimensionality reduction techniques that introduce the ``leaking corners".
comment: 7 pages, Submitted to Conference on Decision and Control (CDC)
Provably safe and human-like car-following behaviors: Part 2. A parsimonious multi-phase model with projected braking
Ensuring safe and human-like trajectory planning for automated vehicles amidst real-world uncertainties remains a critical challenge. While existing car-following models often struggle to consistently provide rigorous safety proofs alongside human-like acceleration and deceleration patterns, we introduce a novel multi-phase projection-based car-following model. This model is designed to balance safety and performance by incorporating bounded acceleration and deceleration rates while emulating key human driving principles. Building upon a foundation of fundamental driving principles and a multi-phase dynamical systems analysis (detailed in Part 1 of this study \citep{jin2025WA20-02_Part1}), we first highlight the limitations of extending standard models like Newell's with simple bounded deceleration. Inspired by human drivers' anticipatory behavior, we mathematically define and analyze projected braking profiles for both leader and follower vehicles, establishing safety criteria and new phase definitions based on the projected braking lead-vehicle problem. The proposed parsimonious model combines an extended Newell's model for nominal driving with a new control law for scenarios requiring projected braking. Using speed-spacing phase plane analysis, we provide rigorous mathematical proofs of the model's adherence to defined safe and human-like driving principles, including collision-free operation, bounded deceleration, and acceptable safe stopping distance, under reasonable initial conditions. Numerical simulations validate the model's superior performance in achieving both safety and human-like braking profiles for the stationary lead-vehicle problem. Finally, we discuss the model's implications and future research directions.
comment: 27 pages, 4 figures
Provably safe and human-like car-following behaviors: Part 1. Analysis of phases and dynamics in standard models
Trajectory planning is essential for ensuring safe driving in the face of uncertainties related to communication, sensing, and dynamic factors such as weather, road conditions, policies, and other road users. Existing car-following models often lack rigorous safety proofs and the ability to replicate human-like driving behaviors consistently. This article applies multi-phase dynamical systems analysis to well-known car-following models to highlight the characteristics and limitations of existing approaches. We begin by formulating fundamental principles for safe and human-like car-following behaviors, which include zeroth-order principles for comfort and minimum jam spacings, first-order principles for speeds and time gaps, and second-order principles for comfort acceleration/deceleration bounds as well as braking profiles. From a set of these zeroth- and first-order principles, we derive Newell's simplified car-following model. Subsequently, we analyze phases within the speed-spacing plane for the stationary lead-vehicle problem in Newell's model and its extensions, which incorporate both bounded acceleration and deceleration. We then analyze the performance of the Intelligent Driver Model and the Gipps model. Through this analysis, we highlight the limitations of these models with respect to some of the aforementioned principles. Numerical simulations and empirical observations validate the theoretical insights. Finally, we discuss future research directions to further integrate safety, human-like behaviors, and vehicular automation in car-following models, which are addressed in Part 2 of this study \citep{jin2025WA20-02_Part2}, where we develop a novel multi-phase projection-based car-following model that addresses the limitations identified here.
comment: 29 pages, 7 figures
Event-Triggered Synergistic Controllers with Dwell-Time Transmission
We propose novel event-triggered synergistic controllers for nonlinear continuous-time plants by incorporating event-triggered control into stabilizing synergistic controllers. We highlight that a naive application of common event-triggering conditions may not ensure dwell-time transmission due to the joint jumping dynamics of the closed-loop system. Under mild conditions, we develop a suite of event-triggered synergistic controllers that guarantee both dwell-time transmission and global asymptotic stability. Through numerical simulations, we demonstrate the effectiveness of our controller applied to the problem of rigid body attitude stabilization.
comment: 9 pages, 2 figures, 1 table
Offline Reinforcement Learning for Microgrid Voltage Regulation ICLR 2025
This paper presents a study on using different offline reinforcement learning algorithms for microgrid voltage regulation with solar power penetration. When environment interaction is unviable due to technical or safety reasons, the proposed approach can still obtain an applicable model through offline-style training on a previously collected dataset, lowering the negative impact of lacking online environment interactions. Experiment results on the IEEE 33-bus system demonstrate the feasibility and effectiveness of the proposed approach on different offline datasets, including the one with merely low-quality experience.
comment: This paper has been accepted and presented at ICLR 2025 in Singapore, Apr. 28, 2025
Hyper Yoshimura: How a slight tweak on a classical folding pattern unleashes meta-stability for deployable robots
Deployable structures inspired by origami offer lightweight, compact, and reconfigurable solutions for robotic and architectural applications. We present a geometric and mechanical framework for Yoshimura-Ori modules that supports a diverse set of metastable states, including newly identified asymmetric "pop-out" and "hyperfolded" configurations. These states are governed by three parameters -- tilt angle, phase shift, and slant height -- and enable discrete, programmable transformations. Using this model, we develop forward and inverse kinematic strategies to stack modules into deployable booms that approximate complex 3D shapes. We validate our approach through mechanical tests and demonstrate a tendon- and pneumatically-actuated Yoshimura Space Crane capable of object manipulation, solar tracking, and high load-bearing performance. A meter-scale solar charging station further illustrates the design's scalability. These results establish Yoshimura-Ori structures as a promising platform for adaptable, multifunctional deployable systems in both terrestrial and space environments.
Stability and Convergence Analysis of Multi-Agent Consensus with Communication Delays: A Lambert W Function Approach
This paper investigates the effect of constant time delay in weakly connected multi-agent systems modeled by double integrator dynamics. A novel analytical approach is proposed to establish an upper bound on the permissible time delay that ensures stability and consensus convergence. The analysis employs the Lambert W function method in higher-dimensional systems to derive explicit conditions under which consensus is achieved. The theoretical results are rigorously proven and provide insight into the allowable delay margins. The analysis applies to general leaderless undirected network topologies. The framework also accounts for complex and realistic delays, including non-commensurate communication delays. Numerical examples are provided to demonstrate the effectiveness of the proposed method.
comment: short version submitted to IEEE L-CSS and CDC 2025
The Path Integral Bottleneck: Exploring the Control-Compute Tradeoff
Executing a control sequence requires some computation effort. Intuitively, a high-effort, fine-grained computation should result in better control (e.g. lower cost), whereas little to no computation effort would lead to worse control. To quantify and explore the tradeoff between control performance and compute effort, we present the Path Integral Bottleneck (PIB), a fusion of the Path Integral (PI) optimal control and Information Bottleneck (IB) frameworks. Both frameworks provide flexible and probabilistic descriptions of control. The PI does not limit itself to a particular control law, and the IB is not bound to any specific state encoding. Combining the generality of both frameworks enables us to produce an analytical description of the control-compute tradeoff. We provide PIB formulations for both continuous and discrete random variables. With these formulations, we can plot a tradeoff curve between performance and computation effort for any given plant description and control cost function. Simulations of a cart-pole for both the continuous and discrete variable cases reveal fundamental control-compute tradeoffs, exposing regions where the task performance-per-compute is higher than others.
comment: 7 pages, 3 Figures, Under Review
Dynamic Beam-Stabilized, Additive-Printed Flexible Antenna Arrays with On-Chip Rapid Insight Generation
Conformal phased arrays promise shape-changing properties, multiple degrees of freedom to the scan angle, and novel applications in wearables, aerospace, defense, vehicles, and ships. However, they have suffered from two critical limitations. (1) Although most applications require on-the-move communication and sensing, prior conformal arrays have suffered from dynamic deformation-induced beam pointing errors. We introduce a Dynamic Beam-Stabilized (DBS) processor capable of beam adaptation through on-chip real-time control of fundamental gain, phase, and delay for each element. (2) Prior conformal arrays have leveraged additive printing to enhance flexibility, but conventional printable inks based on silver are expensive, and those based on copper suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. We instead leverage a low-cost Copper Molecular Decomposition (CuMOD) ink with < 0.1% variation per degree C with temperature and strain and correct any residual deformity in real-time using the DBS processor. Demonstrating unified material and physical deformation correction, our CMOS DBS processor is low power, low-area, and easily scalable due to a tile architecture, thereby ideal for on-device implementations.
Quad-LCD: Layered Control Decomposition Enables Actuator-Feasible Quadrotor Trajectory Planning
In this work, we specialize contributions from prior work on data-driven trajectory generation for a quadrotor system with motor saturation constraints. When motors saturate in quadrotor systems, there is an ``uncontrolled drift" of the vehicle that results in a crash. To tackle saturation, we apply a control decomposition and learn a tracking penalty from simulation data consisting of low, medium and high-cost reference trajectories. Our approach reduces crash rates by around $49\%$ compared to baselines on aggressive maneuvers in simulation. On the Crazyflie hardware platform, we demonstrate feasibility through experiments that lead to successful flights. Motivated by the growing interest in data-driven methods to quadrotor planning, we provide open-source lightweight code with an easy-to-use abstraction of hardware platforms.
comment: 4 pages, 4 figures
A Virtual Admittance-Based Fault Current Limiting Method for Grid-Forming Inverters
Inverter-based resources (IBRs) are a key component in the ongoing modernization of power systems, with grid-forming (GFM) inverters playing a central role. Effective fault current limiting is a major challenge to modernizing power systems through increased penetration of GFM inverters. Due to their voltage-source nature, GFM inverters offer no direct control over the output current and, therefore, are susceptible to high fault currents. This vulnerability is especially pronounced during large phase jumps, a condition overlooked by most fault current limiting methods. This paper proposes a hybrid fault current limiting method implemented through a virtual admittance by leveraging the advantages of two virtual impedance (VI)-based methods tailored for three-phase faults and phase jump disturbances. Electromagnetic transient simulations conducted in MATLAB-Simulink demonstrate the method's effectiveness across various disturbances, validating its potential in single-loop GFM structures.
comment: 5 pages, 6 figures
From noisy observables to accurate ground state energies: a quantum classical signal subspace approach with denoising
We propose a hybrid quantum-classical algorithm for ground state energy (GSE) estimation that remains robust to highly noisy data and exhibits low sensitivity to hyperparameter tuning. Our approach -- Fourier Denoising Observable Dynamic Mode Decomposition (FDODMD) -- combines Fourier-based denoising thresholding to suppress spurious noise modes with observable dynamic mode decomposition (ODMD), a quantum-classical signal subspace method. By applying ODMD to an ensemble of denoised time-domain trajectories, FDODMD reliably estimates the system's eigenfrequencies. We also provide an error analysis of FDODMD. Numerical experiments on molecular systems demonstrate that FDODMD achieves convergence in high-noise regimes inaccessible to baseline methods under a limited quantum computational budget, while accelerating spectral estimation in intermediate-noise regimes. Importantly, this performance gain is entirely classical, requiring no additional quantum overhead and significantly reducing overall quantum resource demands.
Mesh Stability Guaranteed Rigid Body Networks Using Control and Topology Co-Design
Merging and splitting are of great significance for rigid body networks in making such networks reconfigurable. The main challenges lie in simultaneously ensuring the compositionality of the distributed controllers and the mesh stability of the entire network. To this end, we propose a decentralized control and topology co-design method for rigid body networks, which enables flexible joining and leaving of rigid bodies without the need to redesign the controllers for the entire network after such maneuvers. We first provide a centralized linear matrix inequality (LMI)-based control and topology co-design optimization of the rigid body networks with a formal mesh stability guarantee. Then, these centralized mesh stability constraints are made decentralized by a proposed alternative set of sufficient conditions. Using these decentralized mesh stability constraints and Sylvester's criterion-based decentralization techniques, the said centralized LMI problem is equivalently broken down into a set of smaller decentralized LMI problems that can be solved at each rigid body, enabling flexible merging/splitting of rigid bodies. Finally, the effectiveness of the proposed co-design method is illustrated based on a specifically developed simulator and a comparison study with respect to a state-of-the-art method.
comment: 12 pages, 7 figures
System Identification and Control Using Lyapunov-Based Deep Neural Networks without Persistent Excitation: A Concurrent Learning Approach
Deep Neural Networks (DNNs) are increasingly used in control applications due to their powerful function approximation capabilities. However, many existing formulations focus primarily on tracking error convergence, often neglecting the challenge of identifying the system dynamics using the DNN. This paper presents the first result on simultaneous trajectory tracking and online system identification using a DNN-based controller, without requiring persistent excitation. Two new concurrent learning adaptation laws are constructed for the weights of all the layers of the DNN, achieving convergence of the DNN's parameter estimates to a neighborhood of their ideal values, provided the DNN's Jacobian satisfies a finite-time excitation condition. A Lyapunov-based stability analysis is conducted to ensure convergence of the tracking error, weight estimation errors, and observer errors to a neighborhood of the origin. Simulations performed on a range of systems and trajectories, with the same initial and operating conditions, demonstrated 40.5% to 73.6% improvement in function approximation performance compared to the baseline, while maintaining a similar tracking error and control effort. Simulations evaluating function approximation capabilities on data points outside of the trajectory resulted in 58.88% and 74.75% improvement in function approximation compared to the baseline.
Control Invariant Sets for Neural Network Dynamical Systems and Recursive Feasibility in Model Predictive Control
Neural networks are powerful tools for data-driven modeling of complex dynamical systems, enhancing predictive capability for control applications. However, their inherent nonlinearity and black-box nature challenge control designs that prioritize rigorous safety and recursive feasibility guarantees. This paper presents algorithmic methods for synthesizing control invariant sets specifically tailored to neural network based dynamical models. These algorithms employ set recursion, ensuring termination after a finite number of iterations and generating subsets in which closed-loop dynamics are forward invariant, thus guaranteeing perpetual operational safety. Additionally, we propose model predictive control designs that integrate these control invariant sets into mixed-integer optimization, with guaranteed adherence to safety constraints and recursive feasibility at the computational level. We also present a comprehensive theoretical analysis examining the properties and guarantees of the proposed methods. Numerical simulations in an autonomous driving scenario demonstrate the methods' effectiveness in synthesizing control-invariant sets offline and implementing model predictive control online, ensuring safety and recursive feasibility.
Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments
This work extends our prior work on the distributed nonlinear model predictive control (NMPC) for navigating a robot fleet following a certain flocking behavior in unknown obstructed environments with a more realistic local obstacle avoidance strategy. More specifically, we integrate the local obstacle avoidance constraint using point clouds into the NMPC framework. Here, each agent relies on data from its local sensor to perceive and respond to nearby obstacles. A point cloud processing technique is presented for both two-dimensional and three-dimensional point clouds to minimize the computational burden during the optimization. The process consists of directional filtering and down-sampling that significantly reduce the number of data points. The algorithm's performance is validated through realistic 3D simulations in Gazebo, and its practical feasibility is further explored via hardware-in-the-loop (HIL) simulations on embedded platforms.
comment: 22 pages, 14 figures, to be published in Frontiers in Robotics and AI
Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation
This paper presents a modified model predictive control (MPC) framework for real-time power system operation. The framework incorporates a diffusion model tailored for time series generation to enhance the accuracy of the load forecasting module used in the system operation. In the absence of explicit state transition law, a model-identification procedure is leveraged to derive the system dynamics, thereby eliminating a barrier when applying MPC to a renewables-dominated power system. Case study results on an industry park system and the IEEE 30-bus system demonstrate that using the diffusion model to augment the training dataset significantly improves load-forecasting accuracy, and the inferred system dynamics are applicable to the real-time grid operation with solar and wind.
comment: This paper has been accepted by the 2025 IEEE PES General Meeting (PESGM), which will be held in Austin, TX, July 27-31, 2025
On Sampling Time and Invariance
Invariant sets define regions of the state space where system constraints are always satisfied. The majority of numerical techniques for computing invariant sets have been developed for discrete-time systems with a fixed sampling time. Understanding how invariant sets change with sampling time is critical for designing adaptive-sampling control schemes that ensure constraint satisfaction. We introduce M-step hold control invariance, a generalization of traditional control invariance, and show its practical use to assess the link between control sampling frequency and constraint satisfaction. We robustify M-step hold control invariance against model mismatches and discretization errors, paving the way for adaptive-sampling control strategies.
Predictive Position Estimation for Remote Surgery under Packet Loss Using the Informer Framework
Accurate and real-time position estimation of the robotic arm on the patient's side is crucial for the success of remote robotic surgery in Tactile Internet environments. This paper proposes a predictive approach using the computationally efficient Transformer-based Informer model for position estimation, combined with a Four-State Hidden Markov Model (4-State HMM) to simulate realistic packet loss scenarios. The method effectively addresses network-induced delays, jitter, and packet loss, ensuring reliable performance in remote robotic surgery. The study evaluates the Informer model on the JIGSAWS dataset, demonstrating its capability to handle sequential data challenges caused by network uncertainties. Key features, including ProbSparse attention and a generative-style decoder, enhance prediction accuracy, computational speed, and memory efficiency. Results indicate that the proposed method achieves over 90 percent accuracy across varying network conditions. Furthermore, the Informer framework outperforms traditional models such as TCN, RNN, and LSTM, highlighting its suitability for real-time remote surgery applications.
comment: The paper is being withdrawn due to a methodological issue identified during the review process. Specifically, further evaluation revealed inconsistencies in the packet loss modeling and prediction performance analysis. We plan to revise and correct these aspects before considering resubmission
Optimizing Power Grid Topologies with Reinforcement Learning: A Survey of Methods and Challenges
Power grid operation is becoming increasingly complex due to the rising integration of renewable energy sources and the need for more adaptive control strategies. Reinforcement Learning (RL) has emerged as a promising approach to power network control (PNC), offering the potential to enhance decision-making in dynamic and uncertain environments. The Learning To Run a Power Network (L2RPN) competitions have played a key role in accelerating research by providing standardized benchmarks and problem formulations, leading to rapid advancements in RL-based methods. This survey provides a comprehensive and structured overview of RL applications for power grid topology optimization, categorizing existing techniques, highlighting key design choices, and identifying gaps in current research. Additionally, we present a comparative numerical study evaluating the impact of commonly applied RL-based methods, offering insights into their practical effectiveness. By consolidating existing research and outlining open challenges, this survey aims to provide a foundation for future advancements in RL-driven power grid optimization.
comment: 60 pages, 26 figures, preprint
Towards Optimal Orders for Entanglement Swapping in Path Graphs: A Greedy Approach
This paper considers the problem of finding an optimal order for entanglement swapping in a heterogeneous path of quantum repeaters so as to maximize the path throughput defined as the delivery rate of end-to-end entanglements. The primary difficulty in addressing this problem lies in the vast array of possible swapping orders for large paths and the complexity of the expected throughput, which depends on the attributes of each node and edge along the path, as well as the order of swapping. To cope with these issues, we first propose simple approximations in estimating the swapping outcome between two entanglement distributions that can run in constant time, thereby providing an efficient approach for evaluating and comparing different swapping orders, allowing us to solve the problem exactly for small paths. Second, as the number of possible orders grows exponentially with the number of repeaters in the path, we develop an efficient heuristic based on the greedy selection of nodes to sequentially perform swaps according to their swapping scores, defined as the expected number of entanglements resulting from their swaps. The scores are local but dynamic in the sense that they depend not just on the entanglement distributions available on the path but also on prior swapping decisions. Finally, we illustrate the efficiency and effectiveness of our proposed model and approach through extensive experimentation conducted using a general quantum network simulator.
comment: 11 pages, 11 figures
Model-Based Closed-Loop Control Algorithm for Stochastic Partial Differential Equation Control
Neural operators have demonstrated promise in modeling and controlling systems governed by Partial Differential Equations (PDEs). Beyond PDEs, Stochastic Partial Differential Equations (SPDEs) play a critical role in modeling systems influenced by randomness, with applications in finance, physics, and beyond. However, controlling SPDE-governed systems remains a significant challenge. On the one hand, the regularity of the system's state (which can be intuitively understood as smoothness) deteriorates, making modeling and generalization more challenging. On the other hand, this stochasticity also renders control more unstable and thus less accurate. To address this gap, we propose the Model-Based Closed-Loop Control Algorithm (MB-CC), the first model-based closed-loop control method for SPDEs. MB-CC introduces two key innovations to enhance control robustness and efficiency: a Regularity Feature (RF) block and a closed-loop strategy with an operator-encoded policy network. The RF block, inspired by the regularity structure theory of SPDEs, addresses noise-induced irregularities by transforming the network's input, including the system state and noise-perturbed external forces, into a refined feature space for improved forward prediction. Compared to previous works using regularity features, we introduce a new parameterization, data augmentation, and extend the RF block as a plug-and-play component. Additionally, to achieve closed-loop control, we introduce an operator-encoded policy network to map the current state to optimal control, which integrates physical priors and swiftly makes decisions based on states returned by the environment. We conduct a systematic evaluation of MB-CC on two notable SPDEs, showcasing its effectiveness and efficiency. The ablation studies show its ability to handle stochasticity more effectively.
Certifying Lyapunov Stability of Black-Box Nonlinear Systems via Counterexample Guided Synthesis (Extended Version) SC
Finding Lyapunov functions to certify the stability of control systems has been an important topic for verifying safety-critical systems. Most existing methods on finding Lyapunov functions require access to the dynamics of the system. Accurately describing the complete dynamics of a control system however remains highly challenging in practice. Latest trend of using learning-enabled control systems further reduces the transparency. Hence, a method for black-box systems would have much wider applications. Our work stems from the recent idea of sampling and exploiting Lipschitz continuity to approximate the unknown dynamics. Given Lipschitz constants, one can derive a non-statistical upper bounds on approximation errors; hence a strong certification on this approximation can certify the unknown dynamics. We significantly improve this idea by directly approximating the Lie derivative of Lyapunov functions instead of the dynamics. We propose a framework based on the learner-verifier architecture from Counterexample-Guided Inductive Synthesis (CEGIS). Our insight of combining regional verification conditions and counterexample-guided sampling enables a guided search for samples to prove stability region-by-region. Our CEGIS algorithm further ensures termination. Our numerical experiments suggest that it is possible to prove the stability of 2D and 3D systems with a few thousands of samples. Our visualization also reveals the regions where the stability is difficult to prove. In comparison with the existing black-box approach, our approach at the best case requires less than 0.01% of samples.
comment: 30 pages, 3 figures. This is the extended version of the same paper published in the 28th International Conference on Hybrid Systems: Computation and Control (HSCC 2025). Add acknowledgements in v2
Gaming on Coincident Peak Shaving: Equilibrium and Strategic Behavior
Power system operators and electric utility companies often impose a coincident peak demand charge on customers when the aggregate system demand reaches its maximum. This charge incentivizes customers to strategically shift their peak usage away from the system's collective peak, which helps reduce stress on electricity infrastructure. In this paper, we develop a game-theoretic model to analyze how such strategic behavior affects overall system efficiency. We show that depending on the extent of customers' demand-shifting capabilities, the resulting coincident peak shaving game can exhibit concavity, quasi-concavity with discontinuities, or non-concavity with discontinuities. In a two-agent, two-period setting, we derive closed-form Nash equilibrium solutions for each scenario and generalize our findings to multi-agent contexts. We prove the stability of the equilibrium points and propose an algorithm for computing equilibrium outcomes under all game configurations. Our results indicate that the peak-shaving outcome at the equilibrium of the game model is comparable to the optimal outcome of the natural centralized model. However, there is a significant loss in efficiency. Under quasi-concave and non-concave conditions, this inefficiency grows with increased customer flexibility and larger disparities in marginal shifting costs; we also examine how the number of agents influences system performance. Finally, numerical simulations with real-world applications validate our theoretical insights.
A Modular Safety Filter for Safety-Certified Cyber-Physical Systems
Nowadays, many control systems are networked and embed communication and computation capabilities. Such control architectures are prone to cyber attacks on the cyberinfrastructure. Consequently, there is an impellent need to develop solutions to preserve the plant's safety against potential attacks. To ensure safety, this paper introduces a modular safety filter approach that is effective for various cyber-attack types. This solution can be implemented in combination with existing control and detection algorithms, effectively separating safety from performance. The safety filter does not require information on the received command's reliability or the anomaly detector's feature. It can be implemented in conjunction with high-performance, resilient controllers to achieve both high performance during normal operation and safety during an attack. As an illustrative example, we have shown the effectiveness of the proposed design considering a multi-agent formation task involving 20 mobile robots. The simulation results testify that the safety filter operates effectively during undetectable, intelligent attacks.
A New Switched Reluctance Motor with Embedded Permanent Magnets for Transportation Electrification
A new three-phase hybrid-excited multi-tooth switched reluctance motor with embedded permanent magnets is proposed, capable of achieving higher torque density for transportation electrification applications. Operating principles and design considerations are discussed. A magnetic equivalent circuit is developed. Finite element method is employed in the field analysis. The advantages of the proposed topology over existing designs for switched reluctance motors and flux switching motors are presented. Finally, the optimized design is prototyped to experimentally confirm the design and simulation results.
Linear Model of Aggregated Homogeneous Energy Storage Elements with Realizable Dispatch Guarantees
To optimize the dispatch of batteries, a model is required that can predict the state of charge (SOC) trajectory for a chosen open-loop power schedule to ensure admissibility (i.e., that schedule can be realized). However, battery dispatch optimization is inherently challenging since batteries cannot simultaneously charge and discharge, which begets a non-convex complementarity constraint. In this paper, we develop a novel composition of energy storage elements that can charge or discharge independently and provide a sufficient linear energy storage model of the composite battery. This permits convex optimization of the composite battery SOC trajectory while guaranteeing admissibility of the resulting (aggregated) power schedule and its disaggregation to the individual energy storage elements.
Experimental evaluation of xApp Conflict Mitigation Framework in O-RAN: Insights from Testbed deployment in OTIC
Conflict Mitigation (CM) in Open Radio Access Network (O-RAN) is a topic that is gaining importance as commercial O-RAN deployments become more complex. Although research on CM is already covered in terms of simulated network scenarios, it lacks validation using real-world deployment and Over The Air (OTA) Radio Frequency (RF) transmission. Our objective is to conduct the first assessment of the Conflict Mitigation Framework (CMF) for O-RAN using a real-world testbed and OTA RF transmission. This paper presents results of an experiment using a dedicated testbed built in an O-RAN Open Test and Integration Center (OTIC) to confirm the validity of one of the Conflict Resolution (CR) schemes proposed by existing research. The results show that the implemented conflict detection and resolution mechanisms allow a significant improvement in network operation stability by reducing the variability of the measured Downlink (DL) throughput by 78%.
Systems and Control (EESS)
Can On Body Sensing Be Spatial Adaptive?
Wearable sensors are typically affixed to specific locations on the human body, and their position remains static, only changing unintentionally due to motion artifacts. This static configuration introduces significant limitations. As a result, current systems miss the opportunity to capture dynamic physiological data from diverse body regions. This research investigates the potential of developing movable sensors that adaptively reposition themselves to sample different areas of interest on the body, addressing gaps in spatial coverage. We designed, developed, and fabricated a 3 x 3 matrix platform to support moving sensors from one location to another. We validated the feasibility through simulations on a matrix of up to 9 x 9 locations with up to 16 concurrent sensors and real-world prototype characterization.
Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Gas turbine engines represent complex highly nonlinear dynamical systems. Deriving their physics-based models can be challenging as it requires performance characteristics, that are not always available, and one often has to make many simplifying assumptions. In this paper, the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models are discussed and addressed by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics were estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics was mapped into an optimally constructed Koopman eigenfunction space. The process included eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model was validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator were then designed in the eigenfunction space and compared to the classical and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure allowed targeting individual modes during the optimization process, resulting in a better performance tuning. The results showed that the Koopman-based controller outperformed the other benchmark controllers in both reference tracking and disturbance rejection, under sea-level and varying flight conditions, due to its global nature.
comment: 51 pages, 28 figures
AutoCam: Hierarchical Path Planning for an Autonomous Auxiliary Camera in Surgical Robotics
Incorporating an autonomous auxiliary camera into robot-assisted minimally invasive surgery (RAMIS) enhances spatial awareness and eliminates manual viewpoint control. Existing path planning methods for auxiliary cameras track two-dimensional surgical features but do not simultaneously account for camera orientation, workspace constraints, and robot joint limits. This study presents AutoCam: an automatic auxiliary camera placement method to improve visualization in RAMIS. Implemented on the da Vinci Research Kit, the system uses a priority-based, workspace-constrained control algorithm that combines heuristic geometric placement with nonlinear optimization to ensure robust camera tracking. A user study (N=6) demonstrated that the system maintained 99.84% visibility of a salient feature and achieved a pose error of 4.36 $\pm$ 2.11 degrees and 1.95 $\pm$ 5.66 mm. The controller was computationally efficient, with a loop time of 6.8 $\pm$ 12.8 ms. An additional pilot study (N=6), where novices completed a Fundamentals of Laparoscopic Surgery training task, suggests that users can teleoperate just as effectively from AutoCam's viewpoint as from the endoscope's while still benefiting from AutoCam's improved visual coverage of the scene. These results indicate that an auxiliary camera can be autonomously controlled using the da Vinci patient-side manipulators to track a salient feature, laying the groundwork for new multi-camera visualization methods in RAMIS.
comment: 13 pages, 9 figures
Unlocking Innate Computing Abilities in Electric Grids
High energy consumption of artificial intelligence has gained momentum worldwide, which necessitates major investments on expanding efficient and carbon-neutral generation and data center infrastructure in electric power grids. Going beyond the conventional ideation, this article unleashes innate computational abilities in the power grid network circuits itself. By programming power electronic converters (PECs) to mimic biological neurons, we sustainably transform power grids into a neural network and enable it to optimize, compute and make data-driven decisions using distributed PECs. Instead of seen merely as an energy delivery platform, this article conceptualizes a novel application for electric grid to be used as a computing asset without affecting its operation. To illustrate its computational abilities, we solve a affine transformation task in a microgrid with five PECs. By encoding the digital data into the control of PECs, our preliminary results conclude that computing using electric grids does not disturb its operation. From a scientific perspective, this work fundamentally merges energy and computing optimization theories by harnessing inherent high-dimensional computational relationships in electric grids.
Spatially Selective Active Noise Control for Open-fitting Hearables with Acausal Optimization
Recent advances in active noise control have enabled the development of hearables with spatial selectivity, which actively suppress undesired noise while preserving desired sound from specific directions. In this work, we propose an improved approach to spatially selective active noise control that incorporates acausal relative impulse responses into the optimization process, resulting in significantly improved performance over the causal design. We evaluate the system through simulations using a pair of open-fitting hearables with spatially localized speech and noise sources in an anechoic environment. Performance is evaluated in terms of speech distortion, noise reduction, and signal-to-noise ratio improvement across different delays and degrees of acausality. Results show that the proposed acausal optimization consistently outperforms the causal approach across all metrics and scenarios, as acausal filters more effectively characterize the response of the desired source.
comment: Forum Acusticum/Euronoise 2025
A Hybrid Strategy for Aggregated Probabilistic Forecasting and Energy Trading in HEFTCom2024
Obtaining accurate probabilistic energy forecasts and making effective decisions amid diverse uncertainties are routine challenges in future energy systems. This paper presents the solution of team GEB, which ranked 3rd in trading, 4th in forecasting, and 1st among student teams in the IEEE Hybrid Energy Forecasting and Trading Competition 2024 (HEFTCom2024). The solution provides accurate probabilistic forecasts for a wind-solar hybrid system, and achieves substantial trading revenue in the day-ahead electricity market. Key components include: (1) a stacking-based approach combining sister forecasts from various Numerical Weather Predictions (NWPs) to provide wind power forecasts, (2) an online solar post-processing model to address the distribution shift in the online test set caused by increased solar capacity, (3) a probabilistic aggregation method for accurate quantile forecasts of hybrid generation, and (4) a stochastic trading strategy to maximize expected trading revenue considering uncertainties in electricity prices. This paper also explores the potential of end-to-end learning to further enhance the trading revenue by adjusting the distribution of forecast errors. Detailed case studies are provided to validate the effectiveness of these proposed methods. Code for all mentioned methods is available for reproduction and further research in both industry and academia.
comment: Solution description of IEEE Hybrid Energy Forecasting and Trading Competition (HEFTCom)
pc-dbCBS: Kinodynamic Motion Planning of Physically-Coupled Robot Teams
Motion planning problems for physically-coupled multi-robot systems in cluttered environments are challenging due to their high dimensionality. Existing methods combining sampling-based planners with trajectory optimization produce suboptimal results and lack theoretical guarantees. We propose Physically-coupled discontinuity-bounded Conflict-Based Search (pc-dbCBS), an anytime kinodynamic motion planner, that extends discontinuity-bounded CBS to rigidly-coupled systems. Our approach proposes a tri-level conflict detection and resolution framework that includes the physical coupling between the robots. Moreover, pc-dbCBS alternates iteratively between state space representations, thereby preserving probabilistic completeness and asymptotic optimality while relying only on single-robot motion primitives. Across 25 simulated and six real-world problems involving multirotors carrying a cable-suspended payload and differential-drive robots linked by rigid rods, pc-dbCBS solves up to 92% more instances than a state-of-the-art baseline and plans trajectories that are 50-60% faster while reducing planning time by an order of magnitude.
comment: This work has been submitted to the IEEE for possible publication
A Comparative Study of SMT and MILP for the Nurse Rostering Problem
The effects of personnel scheduling on the quality of care and working conditions for healthcare personnel have been thoroughly documented. However, the ever-present demand and large variation of constraints make healthcare scheduling particularly challenging. This problem has been studied for decades, with limited research aimed at applying Satisfiability Modulo Theories (SMT). SMT has gained momentum within the formal verification community in the last decades, leading to the advancement of SMT solvers that have been shown to outperform standard mathematical programming techniques. In this work, we propose generic constraint formulations that can model a wide range of real-world scheduling constraints. Then, the generic constraints are formulated as SMT and MILP problems and used to compare the respective state-of-the-art solvers, Z3 and Gurobi, on academic and real-world inspired rostering problems. Experimental results show how each solver excels for certain types of problems; the MILP solver generally performs better when the problem is highly constrained or infeasible, while the SMT solver performs better otherwise. On real-world inspired problems containing a more varied set of shifts and personnel, the SMT solver excels. Additionally, it was noted during experimentation that the SMT solver was more sensitive to the way the generic constraints were formulated, requiring careful consideration and experimentation to achieve better performance. We conclude that SMT-based methods present a promising avenue for future research within the domain of personnel scheduling.
comment: 6 pages, 3 figures
Country wide Shared FibreBased Infrastructure for Dissemination of Precise Time, Coherent Optical Frequency with Vibration Sensing
With the increasing demand for ultra-precise time synchronization and frequency dissemination across various scientific, industrial, and communication fields, the Czech Republic has developed an innovative, non-commercial fiber-based infrastructure. This infrastructure serves as a shared platform, utilizing optical fibers to enable high-precision timing, coherent frequency transfer, and a newly implemented vibrational sensing capability. The project also addresses challenges posed by classical communication noise-particularly from Raman scattering-on quantum channels, especially for Quantum Key Distribution (QKD). By strategically separating classical and quantum channels into distinct wavelength bands, such as the C-band and O-band, the infrastructure achieves minimal interference while enabling multiple concurrent applications over shared fiber lines.
comment: 4 pages, 2 figures, conference paper
Self Clocked Digital LDO for Cryogenic Power Management in 22nm FDSOI with 98 Percent Efficiency
A universal quantum computer~(QC), though promising ground breaking solutions to complex problems, still faces several challenges with respect to scalability. Current state-of-the-art QC use a great quantity of cables to connect the physical qubits, situated in the cryogenic temperature, to room temperature electronics. Integrated cryogenic electronics together with semiconductor spin qubits is one way closer for scalability. Such a scalable quantum computer can have qubits and the control electronics at 4K stage. Being at 4K, more thermal dissipation is allowed without overloading the cooling capability of the fridge. Still, control and power circuitry is expected to be highly efficient. While commercial CMOS technologies are found to be operatable at \qty{}{mK}, lack of reliable cryogenic models while designing, increased mismatches at cryo temperatures makes the design challenging and risky. Using an FDSOI technology with backgate biasing to compensate for the threshold voltage drift happening at cryo~(compensating around 200mV) and digital circuitry is a way to address this challenge. In this work, a self-clocked digital low dropout regulator (DLDO) is designed in FDSOI for high power efficient, variation tolerant regulator to supply cryogenic circuits for Quantum computing. The proposed digital LDO is more resilient to mismatch and having self clocking and close and fine loops addresses the power efficiency and faster transient response.
Angle diversity receiver as a key enabler for reliable ORIS-based Visible Light Communication
Visible Light Communication (VLC) offers a promising solution to satisfy the increasing demand for wireless data. However, link blockages remain a significant challenge. This paper addresses this issue by investigating the combined use of angle diversity receivers (ADRs) and optical reconfigurable intelligent surfaces (ORISs) in multiuser VLC systems. We consider ORIS elements as small movable mirrors. We demonstrate the complementarity of ADR and ORIS in mitigating link blockages, as well as the advantages of using a larger number of ORIS elements due to the increased field-of-view (FoV) at the receiver enabled by the ADR. An optimization algorithm is proposed to maximize the minimum signal-to-noise power ratio (SNR) to deploy a fair communication network. Numerical results show that integrating ADR and ORIS significantly enhances VLC communication performance, achieving an SNR gain of up to 30 dB compared to a system without ORIS, and mitigating communication outages produced by link blockages or out-of-FoV received signals. We also prove that an ADR with a single tier of photodiodes is sufficient to complement ORIS-assisted VLC.
Discontinuous integro-differential control systems with sliding modes
The paper deals with analysis and design sliding mode control systems modeled by integro-differential equations. Filippov method and equivalent control approach are extended to a class of nonlinear discontinuous integro-differential equations. Sliding mode control algorithm is designed for a control system with distributed input delay. The obtained results are illustrated by numerical example.
DB InfraGO's Automated Dispatching Assistant ADA-PMB
As railway infrastructure manager, DB InfraGO AG is faced with the challenge of offering fluid and punctual operation despite rising demand and increased construction activity. The high capacity utilisation, especially in the core network sections, causes delays to be propagated quickly and widely across the entire network. Up to now, conflicts between train runs can be identified automatically, but dispatching measures have been based on past human experience. An automated dispatching assistance system is currently being piloted to provide support for train dispatchers in their work. The aim is to offer them helpful dispatching recommendations, particularly in stressful situations with a high conflict density in the network section under consideration, in order to ensure the most efficient operation of the system. The recommendations are currently displayed separately alongside the central control system. In future, they will be integrated into the central control system, which will significantly simplify communication between the train dispatcher and signal setter. Further development steps for the integration process are also presented and discussed.
Improving Power Systems Controllability via Edge Centrality Measures
Improving the controllability of power networks is crucial as they are highly complex networks operating in synchrony; even minor perturbations can cause desynchronization and instability. To that end, one needs to assess the criticality of key network components (buses and lines) in terms of their impact on system performance. Traditional methods to identify the key nodes/edges in power networks often rely on static centrality measures based on the network's topological structure ignoring the network's dynamic behavior. In this paper, using multi-machine power network models and a new control-theoretic edge centrality matrix (ECM) approach, we: (i) quantify the influence of edges (i.e., the line susceptances) in terms of controllability performance metrics, (ii) identify the most influential lines, and (iii) compute near-optimal edge modifications that improve the power network controllability. Employing various IEEE power network benchmarks, we validate the effectiveness of the ECM-based algorithm and demonstrate improvements in system reachability, control, and damping performance.
Planar Herding of Multiple Evaders with a Single Herder
A planar herding problem is considered, where a superior pursuer herds a flock of non-cooperative, inferior evaders around a predefined target point. An inverse square law of repulsion is assumed between the pursuer and each evader. Two classes of pursuer trajectories are proposed: (i) a constant angular-velocity spiral, and (ii) a constant angular-velocity circle, both centered around the target point. For the spiraling pursuer, the radial velocity is dynamically adjusted based on a feedback law that depends on the instantaneous position of the evader, which is located at the farthest distance from the target at the start of the game. It is shown that, under suitable choices of the model parameters, all the evaders are herded into an arbitrarily small limit cycle around the target point. Meanwhile, the pursuer also converges onto a circular trajectory around the target. The conditions for the stability of these limit cycles are derived. For the circling pursuer, similar guarantees are provided along with explicit formulas for the radii of the limit cycles.
comment: 12 pages, 16 figures
Fast Heuristic Scheduling and Trajectory Planning for Robotic Fruit Harvesters with Multiple Cartesian Arms
This work proposes a fast heuristic algorithm for the coupled scheduling and trajectory planning of multiple Cartesian robotic arms harvesting fruits. Our method partitions the workspace, assigns fruit-picking sequences to arms, determines tight and feasible fruit-picking schedules and vehicle travel speed, and generates smooth, collision-free arm trajectories. The fruit-picking throughput achieved by the algorithm was assessed using synthetically generated fruit coordinates and a harvester design featuring up to 12 arms. The throughput increased monotonically as more arms were added. Adding more arms when fruit densities were low resulted in diminishing gains because it took longer to travel from one fruit to another. However, when there were enough fruits, the proposed algorithm achieved a linear speedup as the number of arms increased.
comment: This work will be submitted to the IEEE for possible publication
Threshold Strategy for Leaking Corner-Free Hamilton-Jacobi Reachability with Decomposed Computations
Hamilton-Jacobi (HJ) Reachability is widely used to compute value functions for states satisfying specific control objectives. However, it becomes intractable for high-dimensional problems due to the curse of dimensionality. Dimensionality reduction approaches are essential for mitigating this challenge, whereas they could introduce the ``leaking corner issue", leading to inaccuracies in the results. In this paper, we define the ``leaking corner issue" in terms of value functions, propose and prove a necessary condition for its occurrence. We then use these theoretical contributions to introduce a new local updating method that efficiently corrects inaccurate value functions while maintaining the computational efficiency of the dimensionality reduction approaches. We demonstrate the effectiveness of our method through numerical simulations. Although we validate our method with the self-contained subsystem decomposition (SCSD), our approach is applicable to other dimensionality reduction techniques that introduce the ``leaking corners".
comment: 7 pages, Submitted to Conference on Decision and Control (CDC)
Provably safe and human-like car-following behaviors: Part 2. A parsimonious multi-phase model with projected braking
Ensuring safe and human-like trajectory planning for automated vehicles amidst real-world uncertainties remains a critical challenge. While existing car-following models often struggle to consistently provide rigorous safety proofs alongside human-like acceleration and deceleration patterns, we introduce a novel multi-phase projection-based car-following model. This model is designed to balance safety and performance by incorporating bounded acceleration and deceleration rates while emulating key human driving principles. Building upon a foundation of fundamental driving principles and a multi-phase dynamical systems analysis (detailed in Part 1 of this study \citep{jin2025WA20-02_Part1}), we first highlight the limitations of extending standard models like Newell's with simple bounded deceleration. Inspired by human drivers' anticipatory behavior, we mathematically define and analyze projected braking profiles for both leader and follower vehicles, establishing safety criteria and new phase definitions based on the projected braking lead-vehicle problem. The proposed parsimonious model combines an extended Newell's model for nominal driving with a new control law for scenarios requiring projected braking. Using speed-spacing phase plane analysis, we provide rigorous mathematical proofs of the model's adherence to defined safe and human-like driving principles, including collision-free operation, bounded deceleration, and acceptable safe stopping distance, under reasonable initial conditions. Numerical simulations validate the model's superior performance in achieving both safety and human-like braking profiles for the stationary lead-vehicle problem. Finally, we discuss the model's implications and future research directions.
comment: 27 pages, 4 figures
Provably safe and human-like car-following behaviors: Part 1. Analysis of phases and dynamics in standard models
Trajectory planning is essential for ensuring safe driving in the face of uncertainties related to communication, sensing, and dynamic factors such as weather, road conditions, policies, and other road users. Existing car-following models often lack rigorous safety proofs and the ability to replicate human-like driving behaviors consistently. This article applies multi-phase dynamical systems analysis to well-known car-following models to highlight the characteristics and limitations of existing approaches. We begin by formulating fundamental principles for safe and human-like car-following behaviors, which include zeroth-order principles for comfort and minimum jam spacings, first-order principles for speeds and time gaps, and second-order principles for comfort acceleration/deceleration bounds as well as braking profiles. From a set of these zeroth- and first-order principles, we derive Newell's simplified car-following model. Subsequently, we analyze phases within the speed-spacing plane for the stationary lead-vehicle problem in Newell's model and its extensions, which incorporate both bounded acceleration and deceleration. We then analyze the performance of the Intelligent Driver Model and the Gipps model. Through this analysis, we highlight the limitations of these models with respect to some of the aforementioned principles. Numerical simulations and empirical observations validate the theoretical insights. Finally, we discuss future research directions to further integrate safety, human-like behaviors, and vehicular automation in car-following models, which are addressed in Part 2 of this study \citep{jin2025WA20-02_Part2}, where we develop a novel multi-phase projection-based car-following model that addresses the limitations identified here.
comment: 29 pages, 7 figures
Event-Triggered Synergistic Controllers with Dwell-Time Transmission
We propose novel event-triggered synergistic controllers for nonlinear continuous-time plants by incorporating event-triggered control into stabilizing synergistic controllers. We highlight that a naive application of common event-triggering conditions may not ensure dwell-time transmission due to the joint jumping dynamics of the closed-loop system. Under mild conditions, we develop a suite of event-triggered synergistic controllers that guarantee both dwell-time transmission and global asymptotic stability. Through numerical simulations, we demonstrate the effectiveness of our controller applied to the problem of rigid body attitude stabilization.
comment: 9 pages, 2 figures, 1 table
Offline Reinforcement Learning for Microgrid Voltage Regulation ICLR 2025
This paper presents a study on using different offline reinforcement learning algorithms for microgrid voltage regulation with solar power penetration. When environment interaction is unviable due to technical or safety reasons, the proposed approach can still obtain an applicable model through offline-style training on a previously collected dataset, lowering the negative impact of lacking online environment interactions. Experiment results on the IEEE 33-bus system demonstrate the feasibility and effectiveness of the proposed approach on different offline datasets, including the one with merely low-quality experience.
comment: This paper has been accepted and presented at ICLR 2025 in Singapore, Apr. 28, 2025
Hyper Yoshimura: How a slight tweak on a classical folding pattern unleashes meta-stability for deployable robots
Deployable structures inspired by origami offer lightweight, compact, and reconfigurable solutions for robotic and architectural applications. We present a geometric and mechanical framework for Yoshimura-Ori modules that supports a diverse set of metastable states, including newly identified asymmetric "pop-out" and "hyperfolded" configurations. These states are governed by three parameters -- tilt angle, phase shift, and slant height -- and enable discrete, programmable transformations. Using this model, we develop forward and inverse kinematic strategies to stack modules into deployable booms that approximate complex 3D shapes. We validate our approach through mechanical tests and demonstrate a tendon- and pneumatically-actuated Yoshimura Space Crane capable of object manipulation, solar tracking, and high load-bearing performance. A meter-scale solar charging station further illustrates the design's scalability. These results establish Yoshimura-Ori structures as a promising platform for adaptable, multifunctional deployable systems in both terrestrial and space environments.
Stability and Convergence Analysis of Multi-Agent Consensus with Communication Delays: A Lambert W Function Approach
This paper investigates the effect of constant time delay in weakly connected multi-agent systems modeled by double integrator dynamics. A novel analytical approach is proposed to establish an upper bound on the permissible time delay that ensures stability and consensus convergence. The analysis employs the Lambert W function method in higher-dimensional systems to derive explicit conditions under which consensus is achieved. The theoretical results are rigorously proven and provide insight into the allowable delay margins. The analysis applies to general leaderless undirected network topologies. The framework also accounts for complex and realistic delays, including non-commensurate communication delays. Numerical examples are provided to demonstrate the effectiveness of the proposed method.
comment: short version submitted to IEEE L-CSS and CDC 2025
The Path Integral Bottleneck: Exploring the Control-Compute Tradeoff
Executing a control sequence requires some computation effort. Intuitively, a high-effort, fine-grained computation should result in better control (e.g. lower cost), whereas little to no computation effort would lead to worse control. To quantify and explore the tradeoff between control performance and compute effort, we present the Path Integral Bottleneck (PIB), a fusion of the Path Integral (PI) optimal control and Information Bottleneck (IB) frameworks. Both frameworks provide flexible and probabilistic descriptions of control. The PI does not limit itself to a particular control law, and the IB is not bound to any specific state encoding. Combining the generality of both frameworks enables us to produce an analytical description of the control-compute tradeoff. We provide PIB formulations for both continuous and discrete random variables. With these formulations, we can plot a tradeoff curve between performance and computation effort for any given plant description and control cost function. Simulations of a cart-pole for both the continuous and discrete variable cases reveal fundamental control-compute tradeoffs, exposing regions where the task performance-per-compute is higher than others.
comment: 7 pages, 3 Figures, Under Review
Dynamic Beam-Stabilized, Additive-Printed Flexible Antenna Arrays with On-Chip Rapid Insight Generation
Conformal phased arrays promise shape-changing properties, multiple degrees of freedom to the scan angle, and novel applications in wearables, aerospace, defense, vehicles, and ships. However, they have suffered from two critical limitations. (1) Although most applications require on-the-move communication and sensing, prior conformal arrays have suffered from dynamic deformation-induced beam pointing errors. We introduce a Dynamic Beam-Stabilized (DBS) processor capable of beam adaptation through on-chip real-time control of fundamental gain, phase, and delay for each element. (2) Prior conformal arrays have leveraged additive printing to enhance flexibility, but conventional printable inks based on silver are expensive, and those based on copper suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. We instead leverage a low-cost Copper Molecular Decomposition (CuMOD) ink with < 0.1% variation per degree C with temperature and strain and correct any residual deformity in real-time using the DBS processor. Demonstrating unified material and physical deformation correction, our CMOS DBS processor is low power, low-area, and easily scalable due to a tile architecture, thereby ideal for on-device implementations.
Quad-LCD: Layered Control Decomposition Enables Actuator-Feasible Quadrotor Trajectory Planning
In this work, we specialize contributions from prior work on data-driven trajectory generation for a quadrotor system with motor saturation constraints. When motors saturate in quadrotor systems, there is an ``uncontrolled drift" of the vehicle that results in a crash. To tackle saturation, we apply a control decomposition and learn a tracking penalty from simulation data consisting of low, medium and high-cost reference trajectories. Our approach reduces crash rates by around $49\%$ compared to baselines on aggressive maneuvers in simulation. On the Crazyflie hardware platform, we demonstrate feasibility through experiments that lead to successful flights. Motivated by the growing interest in data-driven methods to quadrotor planning, we provide open-source lightweight code with an easy-to-use abstraction of hardware platforms.
comment: 4 pages, 4 figures
A Virtual Admittance-Based Fault Current Limiting Method for Grid-Forming Inverters
Inverter-based resources (IBRs) are a key component in the ongoing modernization of power systems, with grid-forming (GFM) inverters playing a central role. Effective fault current limiting is a major challenge to modernizing power systems through increased penetration of GFM inverters. Due to their voltage-source nature, GFM inverters offer no direct control over the output current and, therefore, are susceptible to high fault currents. This vulnerability is especially pronounced during large phase jumps, a condition overlooked by most fault current limiting methods. This paper proposes a hybrid fault current limiting method implemented through a virtual admittance by leveraging the advantages of two virtual impedance (VI)-based methods tailored for three-phase faults and phase jump disturbances. Electromagnetic transient simulations conducted in MATLAB-Simulink demonstrate the method's effectiveness across various disturbances, validating its potential in single-loop GFM structures.
comment: 5 pages, 6 figures
From noisy observables to accurate ground state energies: a quantum classical signal subspace approach with denoising
We propose a hybrid quantum-classical algorithm for ground state energy (GSE) estimation that remains robust to highly noisy data and exhibits low sensitivity to hyperparameter tuning. Our approach -- Fourier Denoising Observable Dynamic Mode Decomposition (FDODMD) -- combines Fourier-based denoising thresholding to suppress spurious noise modes with observable dynamic mode decomposition (ODMD), a quantum-classical signal subspace method. By applying ODMD to an ensemble of denoised time-domain trajectories, FDODMD reliably estimates the system's eigenfrequencies. We also provide an error analysis of FDODMD. Numerical experiments on molecular systems demonstrate that FDODMD achieves convergence in high-noise regimes inaccessible to baseline methods under a limited quantum computational budget, while accelerating spectral estimation in intermediate-noise regimes. Importantly, this performance gain is entirely classical, requiring no additional quantum overhead and significantly reducing overall quantum resource demands.
Mesh Stability Guaranteed Rigid Body Networks Using Control and Topology Co-Design
Merging and splitting are of great significance for rigid body networks in making such networks reconfigurable. The main challenges lie in simultaneously ensuring the compositionality of the distributed controllers and the mesh stability of the entire network. To this end, we propose a decentralized control and topology co-design method for rigid body networks, which enables flexible joining and leaving of rigid bodies without the need to redesign the controllers for the entire network after such maneuvers. We first provide a centralized linear matrix inequality (LMI)-based control and topology co-design optimization of the rigid body networks with a formal mesh stability guarantee. Then, these centralized mesh stability constraints are made decentralized by a proposed alternative set of sufficient conditions. Using these decentralized mesh stability constraints and Sylvester's criterion-based decentralization techniques, the said centralized LMI problem is equivalently broken down into a set of smaller decentralized LMI problems that can be solved at each rigid body, enabling flexible merging/splitting of rigid bodies. Finally, the effectiveness of the proposed co-design method is illustrated based on a specifically developed simulator and a comparison study with respect to a state-of-the-art method.
comment: 12 pages, 7 figures
System Identification and Control Using Lyapunov-Based Deep Neural Networks without Persistent Excitation: A Concurrent Learning Approach
Deep Neural Networks (DNNs) are increasingly used in control applications due to their powerful function approximation capabilities. However, many existing formulations focus primarily on tracking error convergence, often neglecting the challenge of identifying the system dynamics using the DNN. This paper presents the first result on simultaneous trajectory tracking and online system identification using a DNN-based controller, without requiring persistent excitation. Two new concurrent learning adaptation laws are constructed for the weights of all the layers of the DNN, achieving convergence of the DNN's parameter estimates to a neighborhood of their ideal values, provided the DNN's Jacobian satisfies a finite-time excitation condition. A Lyapunov-based stability analysis is conducted to ensure convergence of the tracking error, weight estimation errors, and observer errors to a neighborhood of the origin. Simulations performed on a range of systems and trajectories, with the same initial and operating conditions, demonstrated 40.5% to 73.6% improvement in function approximation performance compared to the baseline, while maintaining a similar tracking error and control effort. Simulations evaluating function approximation capabilities on data points outside of the trajectory resulted in 58.88% and 74.75% improvement in function approximation compared to the baseline.
Control Invariant Sets for Neural Network Dynamical Systems and Recursive Feasibility in Model Predictive Control
Neural networks are powerful tools for data-driven modeling of complex dynamical systems, enhancing predictive capability for control applications. However, their inherent nonlinearity and black-box nature challenge control designs that prioritize rigorous safety and recursive feasibility guarantees. This paper presents algorithmic methods for synthesizing control invariant sets specifically tailored to neural network based dynamical models. These algorithms employ set recursion, ensuring termination after a finite number of iterations and generating subsets in which closed-loop dynamics are forward invariant, thus guaranteeing perpetual operational safety. Additionally, we propose model predictive control designs that integrate these control invariant sets into mixed-integer optimization, with guaranteed adherence to safety constraints and recursive feasibility at the computational level. We also present a comprehensive theoretical analysis examining the properties and guarantees of the proposed methods. Numerical simulations in an autonomous driving scenario demonstrate the methods' effectiveness in synthesizing control-invariant sets offline and implementing model predictive control online, ensuring safety and recursive feasibility.
Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments
This work extends our prior work on the distributed nonlinear model predictive control (NMPC) for navigating a robot fleet following a certain flocking behavior in unknown obstructed environments with a more realistic local obstacle avoidance strategy. More specifically, we integrate the local obstacle avoidance constraint using point clouds into the NMPC framework. Here, each agent relies on data from its local sensor to perceive and respond to nearby obstacles. A point cloud processing technique is presented for both two-dimensional and three-dimensional point clouds to minimize the computational burden during the optimization. The process consists of directional filtering and down-sampling that significantly reduce the number of data points. The algorithm's performance is validated through realistic 3D simulations in Gazebo, and its practical feasibility is further explored via hardware-in-the-loop (HIL) simulations on embedded platforms.
comment: 22 pages, 14 figures, to be published in Frontiers in Robotics and AI
Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation
This paper presents a modified model predictive control (MPC) framework for real-time power system operation. The framework incorporates a diffusion model tailored for time series generation to enhance the accuracy of the load forecasting module used in the system operation. In the absence of explicit state transition law, a model-identification procedure is leveraged to derive the system dynamics, thereby eliminating a barrier when applying MPC to a renewables-dominated power system. Case study results on an industry park system and the IEEE 30-bus system demonstrate that using the diffusion model to augment the training dataset significantly improves load-forecasting accuracy, and the inferred system dynamics are applicable to the real-time grid operation with solar and wind.
comment: This paper has been accepted by the 2025 IEEE PES General Meeting (PESGM), which will be held in Austin, TX, July 27-31, 2025
On Sampling Time and Invariance
Invariant sets define regions of the state space where system constraints are always satisfied. The majority of numerical techniques for computing invariant sets have been developed for discrete-time systems with a fixed sampling time. Understanding how invariant sets change with sampling time is critical for designing adaptive-sampling control schemes that ensure constraint satisfaction. We introduce M-step hold control invariance, a generalization of traditional control invariance, and show its practical use to assess the link between control sampling frequency and constraint satisfaction. We robustify M-step hold control invariance against model mismatches and discretization errors, paving the way for adaptive-sampling control strategies.
Predictive Position Estimation for Remote Surgery under Packet Loss Using the Informer Framework
Accurate and real-time position estimation of the robotic arm on the patient's side is crucial for the success of remote robotic surgery in Tactile Internet environments. This paper proposes a predictive approach using the computationally efficient Transformer-based Informer model for position estimation, combined with a Four-State Hidden Markov Model (4-State HMM) to simulate realistic packet loss scenarios. The method effectively addresses network-induced delays, jitter, and packet loss, ensuring reliable performance in remote robotic surgery. The study evaluates the Informer model on the JIGSAWS dataset, demonstrating its capability to handle sequential data challenges caused by network uncertainties. Key features, including ProbSparse attention and a generative-style decoder, enhance prediction accuracy, computational speed, and memory efficiency. Results indicate that the proposed method achieves over 90 percent accuracy across varying network conditions. Furthermore, the Informer framework outperforms traditional models such as TCN, RNN, and LSTM, highlighting its suitability for real-time remote surgery applications.
comment: The paper is being withdrawn due to a methodological issue identified during the review process. Specifically, further evaluation revealed inconsistencies in the packet loss modeling and prediction performance analysis. We plan to revise and correct these aspects before considering resubmission
Optimizing Power Grid Topologies with Reinforcement Learning: A Survey of Methods and Challenges
Power grid operation is becoming increasingly complex due to the rising integration of renewable energy sources and the need for more adaptive control strategies. Reinforcement Learning (RL) has emerged as a promising approach to power network control (PNC), offering the potential to enhance decision-making in dynamic and uncertain environments. The Learning To Run a Power Network (L2RPN) competitions have played a key role in accelerating research by providing standardized benchmarks and problem formulations, leading to rapid advancements in RL-based methods. This survey provides a comprehensive and structured overview of RL applications for power grid topology optimization, categorizing existing techniques, highlighting key design choices, and identifying gaps in current research. Additionally, we present a comparative numerical study evaluating the impact of commonly applied RL-based methods, offering insights into their practical effectiveness. By consolidating existing research and outlining open challenges, this survey aims to provide a foundation for future advancements in RL-driven power grid optimization.
comment: 60 pages, 26 figures, preprint
Towards Optimal Orders for Entanglement Swapping in Path Graphs: A Greedy Approach
This paper considers the problem of finding an optimal order for entanglement swapping in a heterogeneous path of quantum repeaters so as to maximize the path throughput defined as the delivery rate of end-to-end entanglements. The primary difficulty in addressing this problem lies in the vast array of possible swapping orders for large paths and the complexity of the expected throughput, which depends on the attributes of each node and edge along the path, as well as the order of swapping. To cope with these issues, we first propose simple approximations in estimating the swapping outcome between two entanglement distributions that can run in constant time, thereby providing an efficient approach for evaluating and comparing different swapping orders, allowing us to solve the problem exactly for small paths. Second, as the number of possible orders grows exponentially with the number of repeaters in the path, we develop an efficient heuristic based on the greedy selection of nodes to sequentially perform swaps according to their swapping scores, defined as the expected number of entanglements resulting from their swaps. The scores are local but dynamic in the sense that they depend not just on the entanglement distributions available on the path but also on prior swapping decisions. Finally, we illustrate the efficiency and effectiveness of our proposed model and approach through extensive experimentation conducted using a general quantum network simulator.
comment: 11 pages, 11 figures
Model-Based Closed-Loop Control Algorithm for Stochastic Partial Differential Equation Control
Neural operators have demonstrated promise in modeling and controlling systems governed by Partial Differential Equations (PDEs). Beyond PDEs, Stochastic Partial Differential Equations (SPDEs) play a critical role in modeling systems influenced by randomness, with applications in finance, physics, and beyond. However, controlling SPDE-governed systems remains a significant challenge. On the one hand, the regularity of the system's state (which can be intuitively understood as smoothness) deteriorates, making modeling and generalization more challenging. On the other hand, this stochasticity also renders control more unstable and thus less accurate. To address this gap, we propose the Model-Based Closed-Loop Control Algorithm (MB-CC), the first model-based closed-loop control method for SPDEs. MB-CC introduces two key innovations to enhance control robustness and efficiency: a Regularity Feature (RF) block and a closed-loop strategy with an operator-encoded policy network. The RF block, inspired by the regularity structure theory of SPDEs, addresses noise-induced irregularities by transforming the network's input, including the system state and noise-perturbed external forces, into a refined feature space for improved forward prediction. Compared to previous works using regularity features, we introduce a new parameterization, data augmentation, and extend the RF block as a plug-and-play component. Additionally, to achieve closed-loop control, we introduce an operator-encoded policy network to map the current state to optimal control, which integrates physical priors and swiftly makes decisions based on states returned by the environment. We conduct a systematic evaluation of MB-CC on two notable SPDEs, showcasing its effectiveness and efficiency. The ablation studies show its ability to handle stochasticity more effectively.
Certifying Lyapunov Stability of Black-Box Nonlinear Systems via Counterexample Guided Synthesis (Extended Version) SC
Finding Lyapunov functions to certify the stability of control systems has been an important topic for verifying safety-critical systems. Most existing methods on finding Lyapunov functions require access to the dynamics of the system. Accurately describing the complete dynamics of a control system however remains highly challenging in practice. Latest trend of using learning-enabled control systems further reduces the transparency. Hence, a method for black-box systems would have much wider applications. Our work stems from the recent idea of sampling and exploiting Lipschitz continuity to approximate the unknown dynamics. Given Lipschitz constants, one can derive a non-statistical upper bounds on approximation errors; hence a strong certification on this approximation can certify the unknown dynamics. We significantly improve this idea by directly approximating the Lie derivative of Lyapunov functions instead of the dynamics. We propose a framework based on the learner-verifier architecture from Counterexample-Guided Inductive Synthesis (CEGIS). Our insight of combining regional verification conditions and counterexample-guided sampling enables a guided search for samples to prove stability region-by-region. Our CEGIS algorithm further ensures termination. Our numerical experiments suggest that it is possible to prove the stability of 2D and 3D systems with a few thousands of samples. Our visualization also reveals the regions where the stability is difficult to prove. In comparison with the existing black-box approach, our approach at the best case requires less than 0.01% of samples.
comment: 30 pages, 3 figures. This is the extended version of the same paper published in the 28th International Conference on Hybrid Systems: Computation and Control (HSCC 2025). Add acknowledgements in v2
Gaming on Coincident Peak Shaving: Equilibrium and Strategic Behavior
Power system operators and electric utility companies often impose a coincident peak demand charge on customers when the aggregate system demand reaches its maximum. This charge incentivizes customers to strategically shift their peak usage away from the system's collective peak, which helps reduce stress on electricity infrastructure. In this paper, we develop a game-theoretic model to analyze how such strategic behavior affects overall system efficiency. We show that depending on the extent of customers' demand-shifting capabilities, the resulting coincident peak shaving game can exhibit concavity, quasi-concavity with discontinuities, or non-concavity with discontinuities. In a two-agent, two-period setting, we derive closed-form Nash equilibrium solutions for each scenario and generalize our findings to multi-agent contexts. We prove the stability of the equilibrium points and propose an algorithm for computing equilibrium outcomes under all game configurations. Our results indicate that the peak-shaving outcome at the equilibrium of the game model is comparable to the optimal outcome of the natural centralized model. However, there is a significant loss in efficiency. Under quasi-concave and non-concave conditions, this inefficiency grows with increased customer flexibility and larger disparities in marginal shifting costs; we also examine how the number of agents influences system performance. Finally, numerical simulations with real-world applications validate our theoretical insights.
A Modular Safety Filter for Safety-Certified Cyber-Physical Systems
Nowadays, many control systems are networked and embed communication and computation capabilities. Such control architectures are prone to cyber attacks on the cyberinfrastructure. Consequently, there is an impellent need to develop solutions to preserve the plant's safety against potential attacks. To ensure safety, this paper introduces a modular safety filter approach that is effective for various cyber-attack types. This solution can be implemented in combination with existing control and detection algorithms, effectively separating safety from performance. The safety filter does not require information on the received command's reliability or the anomaly detector's feature. It can be implemented in conjunction with high-performance, resilient controllers to achieve both high performance during normal operation and safety during an attack. As an illustrative example, we have shown the effectiveness of the proposed design considering a multi-agent formation task involving 20 mobile robots. The simulation results testify that the safety filter operates effectively during undetectable, intelligent attacks.
A New Switched Reluctance Motor with Embedded Permanent Magnets for Transportation Electrification
A new three-phase hybrid-excited multi-tooth switched reluctance motor with embedded permanent magnets is proposed, capable of achieving higher torque density for transportation electrification applications. Operating principles and design considerations are discussed. A magnetic equivalent circuit is developed. Finite element method is employed in the field analysis. The advantages of the proposed topology over existing designs for switched reluctance motors and flux switching motors are presented. Finally, the optimized design is prototyped to experimentally confirm the design and simulation results.
Linear Model of Aggregated Homogeneous Energy Storage Elements with Realizable Dispatch Guarantees
To optimize the dispatch of batteries, a model is required that can predict the state of charge (SOC) trajectory for a chosen open-loop power schedule to ensure admissibility (i.e., that schedule can be realized). However, battery dispatch optimization is inherently challenging since batteries cannot simultaneously charge and discharge, which begets a non-convex complementarity constraint. In this paper, we develop a novel composition of energy storage elements that can charge or discharge independently and provide a sufficient linear energy storage model of the composite battery. This permits convex optimization of the composite battery SOC trajectory while guaranteeing admissibility of the resulting (aggregated) power schedule and its disaggregation to the individual energy storage elements.
Experimental evaluation of xApp Conflict Mitigation Framework in O-RAN: Insights from Testbed deployment in OTIC
Conflict Mitigation (CM) in Open Radio Access Network (O-RAN) is a topic that is gaining importance as commercial O-RAN deployments become more complex. Although research on CM is already covered in terms of simulated network scenarios, it lacks validation using real-world deployment and Over The Air (OTA) Radio Frequency (RF) transmission. Our objective is to conduct the first assessment of the Conflict Mitigation Framework (CMF) for O-RAN using a real-world testbed and OTA RF transmission. This paper presents results of an experiment using a dedicated testbed built in an O-RAN Open Test and Integration Center (OTIC) to confirm the validity of one of the Conflict Resolution (CR) schemes proposed by existing research. The results show that the implemented conflict detection and resolution mechanisms allow a significant improvement in network operation stability by reducing the variability of the measured Downlink (DL) throughput by 78%.
Robotics
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist robot policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a policy-driven data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or visual similarity), DataMIL directly optimizes data selection for task success, allowing us to select data that enhance the policy while dropping data that degrade it. To avoid performing expensive rollouts in the environment during selection, we use a novel surrogate loss function on task-specific data, allowing us to use DataMIL in the real world without degrading performance. We validate our approach on a suite of more than 60 simulation and real-world manipulation tasks - most notably showing successful data selection from the Open X-Embodiment datasets-demonstrating consistent gains in success rates and superior performance over multiple baselines. Our results underscore the importance of end-to-end, performance-aware data selection for unlocking the potential of large prior datasets in robotics. More information at https://robin-lab.cs.utexas.edu/datamodels4imitation/
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
Scaling robot learning requires vast and diverse datasets. Yet the prevailing data collection paradigm-human teleoperation-remains costly and constrained by manual effort and physical robot access. We introduce Real2Render2Real (R2R2R), a novel approach for generating robot training data without relying on object dynamics simulation or teleoperation of robot hardware. The input is a smartphone-captured scan of one or more objects and a single video of a human demonstration. R2R2R renders thousands of high visual fidelity robot-agnostic demonstrations by reconstructing detailed 3D object geometry and appearance, and tracking 6-DoF object motion. R2R2R uses 3D Gaussian Splatting (3DGS) to enable flexible asset generation and trajectory synthesis for both rigid and articulated objects, converting these representations to meshes to maintain compatibility with scalable rendering engines like IsaacLab but with collision modeling off. Robot demonstration data generated by R2R2R integrates directly with models that operate on robot proprioceptive states and image observations, such as vision-language-action models (VLA) and imitation learning policies. Physical experiments suggest that models trained on R2R2R data from a single human demonstration can match the performance of models trained on 150 human teleoperation demonstrations. Project page: https://real2render2real.com
VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation
While vision-language models have advanced significantly, their application in language-conditioned robotic manipulation is still underexplored, especially for contact-rich tasks that extend beyond visually dominant pick-and-place scenarios. To bridge this gap, we introduce Vision-Tactile-Language-Action model, a novel framework that enables robust policy generation in contact-intensive scenarios by effectively integrating visual and tactile inputs through cross-modal language grounding. A low-cost, multi-modal dataset has been constructed in a simulation environment, containing vision-tactile-action-instruction pairs specifically designed for the fingertip insertion task. Furthermore, we introduce Direct Preference Optimization (DPO) to offer regression-like supervision for the VTLA model, effectively bridging the gap between classification-based next token prediction loss and continuous robotic tasks. Experimental results show that the VTLA model outperforms traditional imitation learning methods (e.g., diffusion policies) and existing multi-modal baselines (TLA/VLA), achieving over 90% success rates on unseen peg shapes. Finally, we conduct real-world peg-in-hole experiments to demonstrate the exceptional Sim2Real performance of the proposed VTLA model. For supplementary videos and results, please visit our project website: https://sites.google.com/view/vtla
Learning Long-Context Diffusion Policies via Past-Token Prediction
Reasoning over long sequences of observations and actions is essential for many robotic tasks. Yet, learning effective long-context policies from demonstrations remains challenging. As context length increases, training becomes increasingly expensive due to rising memory demands, and policy performance often degrades as a result of spurious correlations. Recent methods typically sidestep these issues by truncating context length, discarding historical information that may be critical for subsequent decisions. In this paper, we propose an alternative approach that explicitly regularizes the retention of past information. We first revisit the copycat problem in imitation learning and identify an opposite challenge in recent diffusion policies: rather than over-relying on prior actions, they often fail to capture essential dependencies between past and future actions. To address this, we introduce Past-Token Prediction (PTP), an auxiliary task in which the policy learns to predict past action tokens alongside future ones. This regularization significantly improves temporal modeling in the policy head, with minimal reliance on visual representations. Building on this observation, we further introduce a multistage training strategy: pre-train the visual encoder with short contexts, and fine-tune the policy head using cached long-context embeddings. This strategy preserves the benefits of PTP while greatly reducing memory and computational overhead. Finally, we extend PTP into a self-verification mechanism at test time, enabling the policy to score and select candidates consistent with past actions during inference. Experiments across four real-world and six simulated tasks demonstrate that our proposed method improves the performance of long-context diffusion policies by 3x and accelerates policy training by more than 10x.
comment: Videos are available at https://long-context-dp.github.io
Distilling Realizable Students from Unrealizable Teachers
We study policy distillation under privileged information, where a student policy with only partial observations must learn from a teacher with full-state access. A key challenge is information asymmetry: the student cannot directly access the teacher's state space, leading to distributional shifts and policy degradation. Existing approaches either modify the teacher to produce realizable but sub-optimal demonstrations or rely on the student to explore missing information independently, both of which are inefficient. Our key insight is that the student should strategically interact with the teacher --querying only when necessary and resetting from recovery states --to stay on a recoverable path within its own observation space. We introduce two methods: (i) an imitation learning approach that adaptively determines when the student should query the teacher for corrections, and (ii) a reinforcement learning approach that selects where to initialize training for efficient exploration. We validate our methods in both simulated and real-world robotic tasks, demonstrating significant improvements over standard teacher-student baselines in training efficiency and final performance. The project website is available at : https://portal-cornell.github.io/CritiQ_ReTRy/
Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps
Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.
Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities ICRA
The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments. However, existing FM-enabled robots primary operate in closed-world settings, where the robot is given a full prior map or has a full view of its workspace. This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments. To effectively accomplish these missions, robots must actively explore their environments, navigate obstacle-cluttered terrain, handle unexpected sensor inputs, and operate with compute constraints. We discuss recent deployments of SPINE, our LLM-enabled autonomy framework, in field robotic settings. To the best of our knowledge, we present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions. SPINE is agnostic to a particular LLM, which allows us to distill small language models capable of running onboard size, weight and power (SWaP) limited platforms. Via preliminary model distillation work, we then present the first language-driven UAV planner using on-device language models. We conclude our paper by proposing several promising directions for future research.
comment: Accepted to the IEEE ICRA Workshop on Field Robotics 2025
aUToPath: Unified Planning and Control for Autonomous Vehicles in Urban Environments Using Hybrid Lattice and Free-Space Search
This paper presents aUToPath, a unified online framework for global path-planning and control to address the challenge of autonomous navigation in cluttered urban environments. A key component of our framework is a novel hybrid planner that combines pre-computed lattice maps with dynamic free-space sampling to efficiently generate optimal driveable corridors in cluttered scenarios. Our system also features sequential convex programming (SCP)-based model predictive control (MPC) to refine the corridors into smooth, dynamically consistent trajectories. A single optimization problem is used to both generate a trajectory and its corresponding control commands; this addresses limitations of decoupled approaches by guaranteeing a safe and feasible path. Simulation results of the novel planner on randomly generated obstacle-rich scenarios demonstrate the success rate of a free-space Adaptively Informed Trees* (AIT*)-based planner, and runtimes comparable to a lattice-based planner. Real-world experiments of the full system on a Chevrolet Bolt EUV further validate performance in dense obstacle fields, demonstrating no violations of traffic, kinematic, or vehicle constraints, and a 100% success rate across eight trials.
comment: 9 pages, 10 figures. Tanmay P. Patel, Connor Wilson, and Ellina R. Zhang contributed equally
Streaming Multi-agent Pathfinding IJCAI2025
The task of the multi-agent pathfinding (MAPF) problem is to navigate a team of agents from their start point to the goal points. However, this setup is unsuitable in the assembly line scenario, which is periodic with a long working hour. To address this issue, the study formalizes the streaming MAPF (S-MAPF) problem, which assumes that the agents in the same agent stream have a periodic start time and share the same action sequence. The proposed solution, Agent Stream Conflict-Based Search (ASCBS), is designed to tackle this problem by incorporating a cyclic vertex/edge constraint to handle conflicts. Additionally, this work explores the potential usage of the disjoint splitting strategy within ASCBS. Experimental results indicate that ASCBS surpasses traditional MAPF solvers in terms of runtime for scenarios with prolonged working hours.
comment: to be published in IJCAI2025
Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments
This work extends our prior work on the distributed nonlinear model predictive control (NMPC) for navigating a robot fleet following a certain flocking behavior in unknown obstructed environments with a more realistic local obstacle avoidance strategy. More specifically, we integrate the local obstacle avoidance constraint using point clouds into the NMPC framework. Here, each agent relies on data from its local sensor to perceive and respond to nearby obstacles. A point cloud processing technique is presented for both two-dimensional and three-dimensional point clouds to minimize the computational burden during the optimization. The process consists of directional filtering and down-sampling that significantly reduce the number of data points. The algorithm's performance is validated through realistic 3D simulations in Gazebo, and its practical feasibility is further explored via hardware-in-the-loop (HIL) simulations on embedded platforms.
comment: 22 pages, 14 figures, to be published in Frontiers in Robotics and AI
Train a Multi-Task Diffusion Policy on RLBench-18 in One Day with One GPU
We present a method for training multi-task vision-language robotic diffusion policies that reduces training time and memory usage by an order of magnitude. This improvement arises from a previously underexplored distinction between action diffusion and the image diffusion techniques that inspired it: image generation targets are high-dimensional, while robot actions lie in a much lower-dimensional space. Meanwhile, the vision-language conditions for action generation remain high-dimensional. Our approach, Mini-Diffuser, exploits this asymmetry by introducing Level-2 minibatching, which pairs multiple noised action samples with each vision-language condition, instead of the conventional one-to-one sampling strategy. To support this batching scheme, we introduce architectural adaptations to the diffusion transformer that prevent information leakage across samples while maintaining full conditioning access. In RLBench simulations, Mini-Diffuser achieves 95\% of the performance of state-of-the-art multi-task diffusion policies, while using only 5\% of the training time and 7\% of the memory. Real-world experiments further validate that Mini-Diffuser preserves the key strengths of diffusion-based policies, including the ability to model multimodal action distributions and produce behavior conditioned on diverse perceptual inputs. Code available at github.com/utomm/mini-diffuse-actor.
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation
Large Language Models (LLMs) show growing promise in autonomous driving by reasoning over complex traffic scenarios to generate path plans. However, their tendencies toward overconfidence, and hallucinations raise critical safety concerns. We introduce SafePath, a modular framework that augments LLM-based path planning with formal safety guarantees using conformal prediction. SafePath operates in three stages. In the first stage, we use an LLM that generates a set of diverse candidate paths, exploring possible trajectories based on agent behaviors and environmental cues. In the second stage, SafePath filters out high-risk trajectories while guaranteeing that at least one safe option is included with a user-defined probability, through a multiple-choice question-answering formulation that integrates conformal prediction. In the final stage, our approach selects the path with the lowest expected collision risk when uncertainty is low or delegates control to a human when uncertainty is high. We theoretically prove that SafePath guarantees a safe trajectory with a user-defined probability, and we show how its human delegation rate can be tuned to balance autonomy and safety. Extensive experiments on nuScenes and Highway-env show that SafePath reduces planning uncertainty by 77\% and collision rates by up to 70\%, demonstrating effectiveness in making LLM-driven path planning more safer.
Exploring Pose-Guided Imitation Learning for Robotic Precise Insertion
Recent studies have proved that imitation learning shows strong potential in the field of robotic manipulation. However, existing methods still struggle with precision manipulation task and rely on inefficient image/point cloud observations. In this paper, we explore to introduce SE(3) object pose into imitation learning and propose the pose-guided efficient imitation learning methods for robotic precise insertion task. First, we propose a precise insertion diffusion policy which utilizes the relative SE(3) pose as the observation-action pair. The policy models the source object SE(3) pose trajectory relative to the target object. Second, we explore to introduce the RGBD data to the pose-guided diffusion policy. Specifically, we design a goal-conditioned RGBD encoder to capture the discrepancy between the current state and the goal state. In addition, a pose-guided residual gated fusion method is proposed, which takes pose features as the backbone, and the RGBD features selectively compensate for pose feature deficiencies through an adaptive gating mechanism. Our methods are evaluated on 6 robotic precise insertion tasks, demonstrating competitive performance with only 7-10 demonstrations. Experiments demonstrate that the proposed methods can successfully complete precision insertion tasks with a clearance of about 0.01 mm. Experimental results highlight its superior efficiency and generalization capability compared to existing baselines. Code will be available at https://github.com/sunhan1997/PoseInsert.
Strategic Jenga Play via Graph Based Dynamics Modeling ICRA 2025
Controlled manipulation of multiple objects whose dynamics are closely linked is a challenging problem within contact-rich manipulation, requiring an understanding of how the movement of one will impact the others. Using the Jenga game as a testbed to explore this problem, we graph-based modeling to tackle two different aspects of the task: 1) block selection and 2) block extraction. For block selection, we construct graphs of the Jenga tower and attempt to classify, based on the tower's structure, whether removing a given block will cause the tower to collapse. For block extraction, we train a dynamics model that predicts how all the blocks in the tower will move at each timestep in an extraction trajectory, which we then use in a sampling-based model predictive control loop to safely pull blocks out of the tower with a general-purpose parallel-jaw gripper. We train and evaluate our methods in simulation, demonstrating promising results towards block selection and block extraction on a challenging set of full-sized Jenga towers, even at advanced stages of the game.
comment: 5 pages, Oral Spotlight at ICRA 2025 Workshop "Learning Meets Model-Based Methods for Contact-Rich Manipulation"
Improved Corner Cutting Constraints for Mixed-Integer Motion Planning of a Differential Drive Micro-Mobility Vehicle
This paper addresses the problem of motion planning for differential drive micro-mobility platforms. This class of vehicle is designed to perform small-distance transportation of passengers and goods in structured environments. Our approach leverages mixed-integer linear programming (MILP) to compute global optimal collision-free trajectories taking into account the kinematics and dynamics of the vehicle. We propose novel constraints for intersample collision avoidance and demonstrate its effectiveness using pick-up and delivery missions and statistical analysis of Monte Carlo simulations. The results show that the novel formulation provides the best trajectories in terms of time expenditure and control effort when compared to two state-of-the-art approaches.
APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression
Precise initialization plays a critical role in the performance of localization algorithms, especially in the context of robotics, autonomous driving, and computer vision. Poor localization accuracy is often a consequence of inaccurate initial poses, particularly noticeable in GNSS-denied environments where GPS signals are primarily relied upon for initialization. Recent advances in leveraging deep neural networks for pose regression have led to significant improvements in both accuracy and robustness, especially in estimating complex spatial relationships and orientations. In this paper, we introduce APR-Transformer, a model architecture inspired by state-of-the-art methods, which predicts absolute pose (3D position and 3D orientation) using either image or LiDAR data. We demonstrate that our proposed method achieves state-of-the-art performance on established benchmark datasets such as the Radar Oxford Robot-Car and DeepLoc datasets. Furthermore, we extend our experiments to include our custom complex APR-BeIntelli dataset. Additionally, we validate the reliability of our approach in GNSS-denied environments by deploying the model in real-time on an autonomous test vehicle. This showcases the practical feasibility and effectiveness of our approach. The source code is available at:https://github.com/GT-ARC/APR-Transformer.
comment: 8 pages with 6 figures
TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving
In recent years, diffusion model has shown its potential across diverse domains from vision generation to language modeling. Transferring its capabilities to modern autonomous driving systems has also emerged as a promising direction.In this work, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving. The encoded scene information serves as the multi-modal conditional input of the denoising decoder. To tackle the mode collapse dilemma in generating high-quality diverse trajectories, we introduce a simple yet effective multi-modal representation decorrelation optimization mechanism during the training process.TransDiffuser achieves PDMS of 94.85 on the NAVSIM benchmark, surpassing previous state-of-the-art methods without any anchor-based prior trajectories.
comment: Under review
Embodied Intelligent Industrial Robotics: Concepts and Techniques
In recent years, embodied intelligent robotics (EIR) has made significant progress in multi-modal perception, autonomous decision-making, and physical interaction. Some robots have already been tested in general-purpose scenarios such as homes and shopping malls. We aim to advance the research and application of embodied intelligence in industrial scenes. However, current EIR lacks a deep understanding of industrial environment semantics and the normative constraints between industrial operating objects. To address this gap, this paper first reviews the history of industrial robotics and the mainstream EIR frameworks. We then introduce the concept of the embodied intelligent industrial robotics (EIIR) and propose a knowledge-driven EIIR technology framework for industrial environments. The framework includes four main modules: world model, high-level task planner, low-level skill controller, and simulator. We also review the current development of technologies related to each module and highlight recent progress in adapting them to industrial applications. Finally, we summarize the key challenges EIIR faces in industrial scenarios and suggest future research directions. We believe that EIIR technology will shape the next generation of industrial robotics. Industrial systems based on embodied intelligent industrial robots offer strong potential for enabling intelligent manufacturing. We will continue to track and summarize new research in this area and hope this review will serve as a valuable reference for scholars and engineers interested in industrial embodied intelligence. Together, we can help drive the rapid advancement and application of this technology. The associated project can be found at https://github.com/jackeyzengl/Embodied_Intelligent_Industrial_Robotics_Paper_List.
comment: 60 pages, 11 figures. The associated project can be found at https://github.com/jackeyzengl/Embodied_Intelligent_Industrial_Robotics_Paper_List
A drone that learns to efficiently find objects in agricultural fields: from simulation to the real world ICRA
Drones are promising for data collection in precision agriculture, however, they are limited by their battery capacity. Efficient path planners are therefore required. This paper presents a drone path planner trained using Reinforcement Learning (RL) on an abstract simulation that uses object detections and uncertain prior knowledge. The RL agent controls the flight direction and can terminate the flight. By using the agent in combination with the drone's flight controller and a detection network to process camera images, it is possible to evaluate the performance of the agent on real-world data. In simulation, the agent yielded on average a 78% shorter flight path compared to a full coverage planner, at the cost of a 14% lower recall. On real-world data, the agent showed a 72% shorter flight path compared to a full coverage planner, however, at the cost of a 25% lower recall. The lower performance on real-world data was attributed to the real-world object distribution and the lower accuracy of prior knowledge, and shows potential for improvement. Overall, we concluded that for applications where it is not crucial to find all objects, such as weed detection, the learned-based path planner is suitable and efficient.
comment: Accepted to the Novel Approaches for Precision Agriculture and Forestry with Autonomous Robots IEEE ICRA Workshop - 2025
Ethical Aspects of the Use of Social Robots in Elderly Care -- A Systematic Qualitative Review
Background: The use of social robotics in elderly care is increasingly discussed as one way of meeting emerging care needs due to scarce resources. While many potential benefits are associated with robotic care technologies, there is a variety of ethical challenges. To support steps towards a responsible implementation and use, this review develops an overview on ethical aspects of the use of social robots in elderly care from a decision-makers' perspective. Methods: Electronic databases were queried using a comprehensive search strategy based on the key concepts of "ethical aspects", "social robotics" and "elderly care". Abstract and title screening was conducted by two authors independently. Full-text screening was conducted by one author following a joint consolidation phase. Data was extracted using MAXQDA24 by one author, based on a consolidated coding framework. Analysis was performed through modified qualitative content analysis. Results: A total of 1,518 publications were screened, and 248 publications were included. We have organized our analysis in a scheme of ethical hazards, ethical opportunities and unsettled questions, identifying at least 60 broad ethical aspects affecting three different stakeholder groups. While some ethical issues are well-known and broadly discussed our analysis shows a plethora of potentially relevant aspects, often only marginally recognized, that are worthy of consideration from a practical perspective. Discussion: The findings highlight the need for a contextual and detailed evaluation of implementation scenarios. To make use of the vast knowledge of the ethical discourse, we hypothesize that decision-makers need to understand the specific nature of this discourse to be able to engage in careful ethical deliberation.
comment: 93 pages, 1 figure, 5 tables, 3 suplements
Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.
comment: 12 pages, 15 figures
Latent Theory of Mind: A Decentralized Diffusion Architecture for Cooperative Manipulation
We present Latent Theory of Mind (LatentToM), a decentralized diffusion policy architecture for collaborative robot manipulation. Our policy allows multiple manipulators with their own perception and computation to collaborate with each other towards a common task goal with or without explicit communication. Our key innovation lies in allowing each agent to maintain two latent representations: an ego embedding specific to the robot, and a consensus embedding trained to be common to both robots, despite their different sensor streams and poses. We further let each robot train a decoder to infer the other robot's ego embedding from their consensus embedding, akin to theory of mind in latent space. Training occurs centrally, with all the policies' consensus encoders supervised by a loss inspired by sheaf theory, a mathematical theory for clustering data on a topological manifold. Specifically, we introduce a first-order cohomology loss to enforce sheaf-consistent alignment of the consensus embeddings. To preserve the expressiveness of the consensus embedding, we further propose structural constraints based on theory of mind and a directional consensus mechanism. Execution can be fully distributed, requiring no explicit communication between policies. In which case, the information is exchanged implicitly through each robot's sensor stream by observing the actions of the other robots and their effects on the scene. Alternatively, execution can leverage direct communication to share the robots' consensus embeddings, where the embeddings are shared once during each inference step and are aligned using the sheaf Laplacian. In our hardware experiments, LatentToM outperforms a naive decentralized diffusion baseline, and shows comparable performance with a state-of-the-art centralized diffusion policy for bi-manual manipulation. Project website: https://stanfordmsl.github.io/LatentToM/.
Model Identification Adaptive Control with $ρ$-POMDP Planning
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design (IID) and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($\rho$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.
comment: Accepted to CoDIT 2025
FoldNet: Learning Generalizable Closed-Loop Policy for Garment Folding via Keypoint-Driven Asset and Demonstration Synthesis
Due to the deformability of garments, generating a large amount of high-quality data for robotic garment manipulation tasks is highly challenging. In this paper, we present a synthetic garment dataset that can be used for robotic garment folding. We begin by constructing geometric garment templates based on keypoints and applying generative models to generate realistic texture patterns. Leveraging these keypoint annotations, we generate folding demonstrations in simulation and train folding policies via closed-loop imitation learning. To improve robustness, we propose KG-DAgger, which uses a keypoint-based strategy to generate demonstration data for recovering from failures. KG-DAgger significantly improves the model performance, boosting the real-world success rate by 25\%. After training with 15K trajectories (about 2M image-action pairs), the model achieves a 75\% success rate in the real world. Experiments in both simulation and real-world settings validate the effectiveness of our proposed framework.
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
As autonomous robotic systems become increasingly mature, users will want to specify missions at the level of intent rather than in low-level detail. Language is an expressive and intuitive medium for such mission specification. However, realizing language-guided robotic teams requires overcoming significant technical hurdles. Interpreting and realizing language-specified missions requires advanced semantic reasoning. Successful heterogeneous robots must effectively coordinate actions and share information across varying viewpoints. Additionally, communication between robots is typically intermittent, necessitating robust strategies that leverage communication opportunities to maintain coordination and achieve mission objectives. In this work, we present a first-of-its-kind system where an unmanned aerial vehicle (UAV) and an unmanned ground vehicle (UGV) are able to collaboratively accomplish missions specified in natural language while reacting to changes in specification on the fly. We leverage a Large Language Model (LLM)-enabled planner to reason over semantic-metric maps that are built online and opportunistically shared between an aerial and a ground robot. We consider task-driven navigation in urban and rural areas. Our system must infer mission-relevant semantics and actively acquire information via semantic mapping. In both ground and air-ground teaming experiments, we demonstrate our system on seven different natural-language specifications at up to kilometer-scale navigation.
comment: 19 pages, 24 figures, 7 tables. Submitted to T-FR
VGC-RIO: A Tightly Integrated Radar-Inertial Odometry with Spatial Weighted Doppler Velocity and Local Geometric Constrained RCS Histograms
Recent advances in 4D radar-inertial odometry have demonstrated promising potential for autonomous lo calization in adverse conditions. However, effective handling of sparse and noisy radar measurements remains a critical challenge. In this paper, we propose a radar-inertial odometry with a spatial weighting method that adapts to unevenly distributed points and a novel point-description histogram for challenging point registration. To make full use of the Doppler velocity from different spatial sections, we propose a weighting calculation model. To enhance the point cloud registration performance under challenging scenarios, we con struct a novel point histogram descriptor that combines local geometric features and radar cross-section (RCS) features. We have also conducted extensive experiments on both public and self-constructed datasets. The results demonstrate the precision and robustness of the proposed VGC-RIO.
Imitation Learning for Adaptive Control of a Virtual Soft Exoglove
The use of wearable robots has been widely adopted in rehabilitation training for patients with hand motor impairments. However, the uniqueness of patients' muscle loss is often overlooked. Leveraging reinforcement learning and a biologically accurate musculoskeletal model in simulation, we propose a customized wearable robotic controller that is able to address specific muscle deficits and to provide compensation for hand-object manipulation tasks. Video data of a same subject performing human grasping tasks is used to train a manipulation model through learning from demonstration. This manipulation model is subsequently fine-tuned to perform object-specific interaction tasks. The muscle forces in the musculoskeletal manipulation model are then weakened to simulate neurological motor impairments, which are later compensated by the actuation of a virtual wearable robotics glove. Results shows that integrating the virtual wearable robotic glove provides shared assistance to support the hand manipulator with weakened muscle forces. The learned exoglove controller achieved an average of 90.5\% of the original manipulation proficiency.
OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions
Lane Keeping Assist (LKA) is widely adopted in modern vehicles, yet its real-world performance remains underexplored due to proprietary systems and limited data access. This paper presents OpenLKA, the first open, large-scale dataset for LKA evaluation and improvement. It includes 400 hours of driving data from 50+ production vehicle models, collected through extensive road testing in Tampa, Florida and global contributions from the Comma.ai driving community. The dataset spans a wide range of challenging scenarios, including complex road geometries, degraded lane markings, adverse weather, lighting conditions and surrounding traffic. The dataset is multimodal, comprising: i) full CAN bus streams, decoded using custom reverse-engineered DBC files to extract key LKA events (e.g., system disengagements, lane detection failures); ii) synchronized high-resolution dash-cam video; iii) real-time outputs from Openpilot, providing accurate estimates of road curvature and lane positioning; iv) enhanced scene annotations generated by Vision Language Models, describing lane visibility, pavement quality, weather, lighting, and traffic conditions. By integrating vehicle-internal signals with high-fidelity perception and rich semantic context, OpenLKA provides a comprehensive platform for benchmarking the real-world performance of production LKA systems, identifying safety-critical operational scenarios, and assessing the readiness of current road infrastructure for autonomous driving. The dataset is publicly available at: https://github.com/OpenLKA/OpenLKA.
Deployable and Generalizable Motion Prediction: Taxonomy, Open Challenges and Future Directions
Motion prediction, the anticipation of future agent states or scene evolution, is rooted in human cognition, bridging perception and decision-making. It enables intelligent systems, such as robots and self-driving cars, to act safely in dynamic, human-involved environments, and informs broader time-series reasoning challenges. With advances in methods, representations, and datasets, the field has seen rapid progress, reflected in quickly evolving benchmark results. Yet, when state-of-the-art methods are deployed in the real world, they often struggle to generalize to open-world conditions and fall short of deployment standards. This reveals a gap between research benchmarks, which are often idealized or ill-posed, and real-world complexity. To address this gap, this survey revisits the generalization and deployability of motion prediction models, with an emphasis on the applications of robotics, autonomous driving, and human motion. We first offer a comprehensive taxonomy of motion prediction methods, covering representations, modeling strategies, application domains, and evaluation protocols. We then study two key challenges: (1) how to push motion prediction models to be deployable to realistic deployment standards, where motion prediction does not act in a vacuum, but functions as one module of closed-loop autonomy stacks - it takes input from the localization and perception, and informs downstream planning and control. 2) how to generalize motion prediction models from limited seen scenarios/datasets to the open-world settings. Throughout the paper, we highlight critical open challenges to guide future work, aiming to recalibrate the community's efforts, fostering progress that is not only measurable but also meaningful for real-world applications.
comment: Initial draft, 162 pages, 40 figures, 13 tables
A Novel 6-axis Force/Torque Sensor Using Inductance Sensors
This paper presents a novel six-axis force/torque (F/T) sensor based on inductive sensing technology. Unlike conventional strain gauge-based sensors that require direct contact and external amplification, the proposed sensor utilizes non-contact inductive measurements to estimate force via displacement of a conductive target. A compact, fully integrated architecture is achieved by incorporating a CAN-FD based signal processing module directly onto the PCB, enabling high-speed data acquisition at up to 4~kHz without external DAQ systems. The sensing mechanism is modeled and calibrated through a rational function fitting approach, which demonstrated superior performance in terms of root mean square error (RMSE), coefficient of determination ($R^2$), and linearity error compared to other nonlinear models. Static and repeatability experiments validate the sensor's accuracy, achieving a resolution of 0.03~N and quantization levels exceeding 55,000 steps, surpassing that of commercial sensors. The sensor also exhibits low crosstalk, high sensitivity, and robust noise characteristics. Its performance and structure make it suitable for precision robotic applications, especially in scenarios where compactness, non-contact operation, and integrated processing are essential.
comment: 10 pages, 8 figures
Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability
In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
comment: 10 pages, 2 figures
Reach-Avoid-Stabilize Using Admissible Control Sets
Hamilton-Jacobi Reachability (HJR) analysis has been successfully used in many robotics and control tasks, and is especially effective in computing reach-avoid sets and control laws that enable an agent to reach a goal while satisfying state constraints. However, the original HJR formulation provides no guarantees of safety after a) the prescribed time horizon, or b) goal satisfaction. The reach-avoid-stabilize (RAS) problem has therefore gained a lot of focus: find the set of initial states (the RAS set), such that the trajectory can reach the target, and stabilize to some point of interest (POI) while avoiding obstacles. Solving RAS problems using HJR usually requires defining a new value function, whose zero sub-level set is the RAS set. The existing methods do not consider the problem when there are a series of targets to reach and/or obstacles to avoid. We propose a method that uses the idea of admissible control sets; we guarantee that the system will reach each target while avoiding obstacles as prescribed by the given time series. Moreover, we guarantee that the trajectory ultimately stabilizes to the POI. The proposed method provides an under-approximation of the RAS set, guaranteeing safety. Numerical examples are provided to validate the theory.
comment: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control
RT-cache: Efficient Robot Trajectory Retrieval System
This paper introduces RT-cache, a novel trajectorymemory pipeline that accelerates real-world robot inference by leveraging big-data retrieval and learning from experience. While modern Vision-Language-Action (VLA) models can handle diverse robotic tasks, they often incur high per-step inference costs, resulting in significant latency, sometimes minutes per task. In contrast, RT-cache stores a large-scale Memory of previously successful robot trajectories and retrieves relevant multistep motion snippets, drastically reducing inference overhead. By integrating a Memory Builder with a Trajectory Retrieval, we develop an efficient retrieval process that remains tractable even for extremely large datasets. RT-cache flexibly accumulates real-world experiences and replays them whenever the current scene matches past states, adapting quickly to new or unseen environments with only a few additional samples. Experiments on the Open-X Embodiment Dataset and other real-world data demonstrate that RT-cache completes tasks both faster and more successfully than a baseline lacking retrieval, suggesting a practical, data-driven solution for real-time manipulation.
comment: 9 pages, 5 figures. Submitted to an IEEE robotics conference
EdgeAI Drone for Autonomous Construction Site Demonstrator ICRA 2025
The fields of autonomous systems and robotics are receiving considerable attention in civil applications such as construction, logistics, and firefighting. Nevertheless, the widespread adoption of these technologies is hindered by the necessity for robust processing units to run AI models. Edge-AI solutions offer considerable promise, enabling low-power, cost-effective robotics that can automate civil services, improve safety, and enhance sustainability. This paper presents a novel Edge-AI-enabled drone-based surveillance system for autonomous multi-robot operations at construction sites. Our system integrates a lightweight MCU-based object detection model within a custom-built UAV platform and a 5G-enabled multi-agent coordination infrastructure. We specifically target the real-time obstacle detection and dynamic path planning problem in construction environments, providing a comprehensive dataset specifically created for MCU-based edge applications. Field experiments demonstrate practical viability and identify optimal operational parameters, highlighting our approach's scalability and computational efficiency advantages compared to existing UAV solutions. The present and future roles of autonomous vehicles on construction sites are also discussed, as well as the effectiveness of edge-AI solutions. We share our dataset publicly at github.com/egirgin/storaige-b950
comment: Paper presented at the 4th Workshop on Future of Construction: Safe, Reliable, and Precise Robots in Construction Environments, ICRA 2025, Atlanta, GA, United States
Learning Rock Pushability on Rough Planetary Terrain ICRA 2025
In the context of mobile navigation in unstructured environments, the predominant approach entails the avoidance of obstacles. The prevailing path planning algorithms are contingent upon deviating from the intended path for an indefinite duration and returning to the closest point on the route after the obstacle is left behind spatially. However, avoiding an obstacle on a path that will be used repeatedly by multiple agents can hinder long-term efficiency and lead to a lasting reliance on an active path planning system. In this study, we propose an alternative approach to mobile navigation in unstructured environments by leveraging the manipulation capabilities of a robotic manipulator mounted on top of a mobile robot. Our proposed framework integrates exteroceptive and proprioceptive feedback to assess the push affordance of obstacles, facilitating their repositioning rather than avoidance. While our preliminary visual estimation takes into account the characteristics of both the obstacle and the surface it relies on, the push affordance estimation module exploits the force feedback obtained by interacting with the obstacle via a robotic manipulator as the guidance signal. The objective of our navigation approach is to enhance the efficiency of routes utilized by multiple agents over extended periods by reducing the overall time spent by a fleet in environments where autonomous infrastructure development is imperative, such as lunar or Martian surfaces.
comment: Paper presented at the Workshop on Field Robotics, ICRA 2025, Atlanta, GA, United States
Neural Inertial Odometry from Lie Events
Neural displacement priors (NDP) can reduce the drift in inertial odometry and provide uncertainty estimates that can be readily fused with off-the-shelf filters. However, they fail to generalize to different IMU sampling rates and trajectory profiles, which limits their robustness in diverse settings. To address this challenge, we replace the traditional NDP inputs comprising raw IMU data with Lie events that are robust to input rate changes and have favorable invariances when observed under different trajectory profiles. Unlike raw IMU data sampled at fixed rates, Lie events are sampled whenever the norm of the IMU pre-integration change, mapped to the Lie algebra of the SE(3) group, exceeds a threshold. Inspired by event-based vision, we generalize the notion of level-crossing on 1D signals to level-crossings on the Lie algebra and generalize binary polarities to normalized Lie polarities within this algebra. We show that training NDPs on Lie events incorporating these polarities reduces the trajectory error of off-the-shelf downstream inertial odometry methods by up to 21% with only minimal preprocessing. We conjecture that many more sensors than IMUs or cameras can benefit from an event-based sampling paradigm and that this work makes an important first step in this direction.
comment: accepted at RSS 2025
Grasp EveryThing (GET): 1-DoF, 3-Fingered Gripper with Tactile Sensing for Robust Grasping
We introduce the Grasp EveryThing (GET) gripper, a novel 1-DoF, 3-finger design for securely grasping objects of many shapes and sizes. Mounted on a standard parallel jaw actuator, the design features three narrow, tapered fingers arranged in a two-against-one configuration, where the two fingers converge into a V-shape. The GET gripper is more capable of conforming to object geometries and forming secure grasps than traditional designs with two flat fingers. Inspired by the principle of self-similarity, these V-shaped fingers enable secure grasping across a wide range of object sizes. Further to this end, fingers are parametrically designed for convenient resizing and interchangeability across robotic embodiments with a parallel jaw gripper. Additionally, we incorporate a rigid fingernail to enhance small object manipulation. Tactile sensing can be integrated into the standalone finger via an externally-mounted camera. A neural network was trained to estimate normal force from tactile images with an average validation error of 1.3~N across a diverse set of geometries. In grasping 15 objects and performing 3 tasks via teleoperation, the GET fingers consistently outperformed standard flat fingers. Finger designs for use with multiple robotic embodiments are available on GitHub.
Neural Associative Skill Memories for safer robotics and modelling human sensorimotor repertoires
Modern robots face challenges shared by humans, where machines must learn multiple sensorimotor skills and express them adaptively. Equipping robots with a human-like memory of how it feels to do multiple stereotypical movements can make robots more aware of normal operational states and help develop self-preserving safer robots. Associative Skill Memories (ASMs) aim to address this by linking movement primitives to sensory feedback, but existing implementations rely on hard-coded libraries of individual skills. A key unresolved problem is how a single neural network can learn a repertoire of skills while enabling fault detection and context-aware execution. Here we introduce Neural Associative Skill Memories (ASMs), a framework that utilises self-supervised predictive coding for temporal prediction to unify skill learning and expression, using biologically plausible learning rules. Unlike traditional ASMs which require explicit skill selection, Neural ASMs implicitly recognize and express skills through contextual inference, enabling fault detection across learned behaviours without an explicit skill selection mechanism. Compared to recurrent neural networks trained via backpropagation through time, our model achieves comparable qualitative performance in skill memory expression while using local learning rules and predicts a biologically relevant speed-accuracy trade-off during skill memory expression. This work advances the field of neurorobotics by demonstrating how predictive coding principles can model adaptive robot control and human motor preparation. By unifying fault detection, reactive control, skill memorisation and expression into a single energy-based architecture, Neural ASMs contribute to safer robotics and provide a computational lens to study biological sensorimotor learning.
Trailblazer: Learning offroad costmaps for long range planning
Autonomous navigation in off-road environments remains a significant challenge in field robotics, particularly for Unmanned Ground Vehicles (UGVs) tasked with search and rescue, exploration, and surveillance. Effective long-range planning relies on the integration of onboard perception systems with prior environmental knowledge, such as satellite imagery and LiDAR data. This work introduces Trailblazer, a novel framework that automates the conversion of multi-modal sensor data into costmaps, enabling efficient path planning without manual tuning. Unlike traditional approaches, Trailblazer leverages imitation learning and a differentiable A* planner to learn costmaps directly from expert demonstrations, enhancing adaptability across diverse terrains. The proposed methodology was validated through extensive real-world testing, achieving robust performance in dynamic and complex environments, demonstrating Trailblazer's potential for scalable, efficient autonomous navigation.
General Dynamic Goal Recognition AAAI 2025
Understanding an agent's intent through its behavior is essential in human-robot interaction, interactive AI systems, and multi-agent collaborations. This task, known as Goal Recognition (GR), poses significant challenges in dynamic environments where goals are numerous and constantly evolving. Traditional GR methods, designed for a predefined set of goals, often struggle to adapt to these dynamic scenarios. To address this limitation, we introduce the General Dynamic GR problem - a broader definition of GR - aimed at enabling real-time GR systems and fostering further research in this area. Expanding on this foundation, this paper employs a model-free goal-conditioned RL approach to enable fast adaptation for GR across various changing tasks.
comment: Accepted for publication at Generalization in Planning (GenPlan) as part of AAAI 2025 workshops
Risk-Aware Safe Reinforcement Learning for Control of Stochastic Linear Systems
This paper presents a risk-aware safe reinforcement learning (RL) control design for stochastic discrete-time linear systems. Rather than using a safety certifier to myopically intervene with the RL controller, a risk-informed safe controller is also learned besides the RL controller, and the RL and safe controllers are combined together. Several advantages come along with this approach: 1) High-confidence safety can be certified without relying on a high-fidelity system model and using limited data available, 2) Myopic interventions and convergence to an undesired equilibrium can be avoided by deciding on the contribution of two stabilizing controllers, and 3) highly efficient and computationally tractable solutions can be provided by optimizing over a scalar decision variable and linear programming polyhedral sets. To learn safe controllers with a large invariant set, piecewise affine controllers are learned instead of linear controllers. To this end, the closed-loop system is first represented using collected data, a decision variable, and noise. The effect of the decision variable on the variance of the safe violation of the closed-loop system is formalized. The decision variable is then designed such that the probability of safety violation for the learned closed-loop system is minimized. It is shown that this control-oriented approach reduces the data requirements and can also reduce the variance of safety violations. Finally, to integrate the safe and RL controllers, a new data-driven interpolation technique is introduced. This method aims to maintain the RL agent's optimal implementation while ensuring its safety within environments characterized by noise. The study concludes with a simulation example that serves to validate the theoretical results.
comment: Submitted to Asian Journal of Control
Unfettered Forceful Skill Acquisition with Physical Reasoning and Coordinate Frame Labeling
Vision language models (VLMs) exhibit vast knowledge of the physical world, including intuition of physical and spatial properties, affordances, and motion. With fine-tuning, VLMs can also natively produce robot trajectories. We demonstrate that eliciting wrenches, not trajectories, allows VLMs to explicitly reason about forces and leads to zero-shot generalization in a series of manipulation tasks without pretraining. We achieve this by overlaying a consistent visual representation of relevant coordinate frames on robot-attached camera images to augment our query. First, we show how this addition enables a versatile motion control framework evaluated across four tasks (opening and closing a lid, pushing a cup or chair) spanning prismatic and rotational motion, an order of force and position magnitude, different camera perspectives, annotation schemes, and two robot platforms over 220 experiments, resulting in 51% success across the four tasks. Then, we demonstrate that the proposed framework enables VLMs to continually reason about interaction feedback to recover from task failure or incompletion, with and without human supervision. Finally, we observe that prompting schemes with visual annotation and embodied reasoning can bypass VLM safeguards. We characterize prompt component contribution to harmful behavior elicitation and discuss its implications for developing embodied reasoning. Our code, videos, and data are available at: https://scalingforce.github.io/.
EnerVerse-AC: Envisioning Embodied Environments with Action Condition
Robotic imitation learning has advanced from solving static tasks to addressing dynamic interaction scenarios, but testing and evaluation remain costly and challenging due to the need for real-time interaction with dynamic environments. We propose EnerVerse-AC (EVAC), an action-conditional world model that generates future visual observations based on an agent's predicted actions, enabling realistic and controllable robotic inference. Building on prior architectures, EVAC introduces a multi-level action-conditioning mechanism and ray map encoding for dynamic multi-view image generation while expanding training data with diverse failure trajectories to improve generalization. As both a data engine and evaluator, EVAC augments human-collected trajectories into diverse datasets and generates realistic, action-conditioned video observations for policy testing, eliminating the need for physical robots or complex simulations. This approach significantly reduces costs while maintaining high fidelity in robotic manipulation evaluation. Extensive experiments validate the effectiveness of our method. Code, checkpoints, and datasets can be found at .
comment: Website: https://annaj2178.github.io/EnerverseAC.github.io
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
Vision-Language Models (VLMs) have revolutionized artificial intelligence and robotics due to their commonsense reasoning capabilities. In robotic manipulation, VLMs are used primarily as high-level planners, but recent work has also studied their lower-level reasoning ability, which refers to making decisions about precise robot movements. However, the community currently lacks a clear and common benchmark that can evaluate how well VLMs can aid low-level reasoning in robotics. Consequently, we propose a novel benchmark, ManipBench, to evaluate the low-level robot manipulation reasoning capabilities of VLMs across various dimensions, including how well they understand object-object interactions and deformable object manipulation. We extensively test 33 representative VLMs across 10 model families on our benchmark, including variants to test different model sizes. Our evaluation shows that the performance of VLMs significantly varies across tasks, and there is a strong correlation between this performance and trends in our real-world manipulation tasks. It also shows that there remains a significant gap between these models and human-level understanding. See our website at: https://manipbench.github.io.
comment: 47 pages, 29 figures. Under review
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Recent advances in creative AI have enabled the synthesis of high-fidelity images and videos conditioned on language instructions. Building on these developments, text-to-video diffusion models have evolved into embodied world models (EWMs) capable of generating physically plausible scenes from language commands, effectively bridging vision and action in embodied AI applications. This work addresses the critical challenge of evaluating EWMs beyond general perceptual metrics to ensure the generation of physically grounded and action-consistent behaviors. We propose the Embodied World Model Benchmark (EWMBench), a dedicated framework designed to evaluate EWMs based on three key aspects: visual scene consistency, motion correctness, and semantic alignment. Our approach leverages a meticulously curated dataset encompassing diverse scenes and motion patterns, alongside a comprehensive multi-dimensional evaluation toolkit, to assess and compare candidate models. The proposed benchmark not only identifies the limitations of existing video generation models in meeting the unique requirements of embodied tasks but also provides valuable insights to guide future advancements in the field. The dataset and evaluation tools are publicly available at https://github.com/AgibotTech/EWMBench.
comment: Website: https://github.com/AgibotTech/EWMBench
Neural Brain: A Neuroscience-inspired Framework for Embodied Agents
The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the rise of embodied AI, where autonomous agents, such as humanoid robots, must navigate and manipulate unstructured environments with human-like adaptability. At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability. A Neural Brain must seamlessly integrate multimodal sensing and perception with cognitive capabilities. Achieving this also requires an adaptive memory system and energy-efficient hardware-software co-design, enabling real-time action in dynamic environments. This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges: (1) defining the core components of Neural Brain and (2) bridging the gap between static AI models and the dynamic adaptability required for real-world deployment. To this end, we propose a biologically inspired architecture that integrates multimodal active sensing, perception-cognition-action function, neuroplasticity-based memory storage and updating, and neuromorphic hardware/software optimization. Furthermore, we also review the latest research on embodied agents across these four aspects and analyze the gap between current AI systems and human intelligence. By synthesizing insights from neuroscience, we outline a roadmap towards the development of generalizable, autonomous agents capable of human-level intelligence in real-world scenarios.
comment: 51 pages, 17 figures, 9 tables
AdaWorld: Learning Adaptable World Models with Latent Actions ICML 2025
World models aim to learn action-controlled future prediction and have proven essential for the development of intelligent agents. However, most existing world models rely heavily on substantial action-labeled data and costly training, making it challenging to adapt to novel environments with heterogeneous actions through limited interactions. This limitation can hinder their applicability across broader domains. To overcome this limitation, we propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. The key idea is to incorporate action information during the pretraining of world models. This is achieved by extracting latent actions from videos in a self-supervised manner, capturing the most critical transitions between frames. We then develop an autoregressive world model that conditions on these latent actions. This learning paradigm enables highly adaptable world models, facilitating efficient transfer and learning of new actions even with limited interactions and finetuning. Our comprehensive experiments across multiple environments demonstrate that AdaWorld achieves superior performance in both simulation quality and visual planning.
comment: ICML 2025. Project page: https://adaptable-world-model.github.io/, code: https://github.com/Little-Podi/AdaWorld, model: https://huggingface.co/Little-Podi/AdaWorld
Guaranteed Rejection-free Sampling Method Using Past Behaviours for Motion Planning of Autonomous Systems
The paper presents a novel learning-based sampling strategy that guarantees rejection-free sampling of the free space under both biased and approximately uniform conditions, leveraging multivariate kernel densities. Historical data from a given autonomous system is leveraged to estimate a non-parametric probabilistic description of the domain, which also describes the free space where feasible solutions of the motion planning problem are likely to be found. The tuning parameters of the kernel density estimator, the bandwidth and the kernel, are used to alter the description of the free space so that no samples can fall outside the originally defined space.The proposed method is demonstrated in two real-life case studies: An autonomous surface vessel (2D) and an autonomous drone (3D). Two planning problems are solved, showing that the proposed approximately uniform sampling scheme is capable of guaranteeing rejection-free samples of the considered workspace. Furthermore, the effectiveness of the proposed method is statistically validated using Monte Carlo simulations.
comment: Accepted for publication in Robotics and Autonomous Systems
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
comment: Project website: https://robotics-transformer-x.github.io
Safe Navigation in Uncertain Crowded Environments Using Risk Adaptive CVaR Barrier Functions
Robot navigation in dynamic, crowded environments poses a significant challenge due to the inherent uncertainties in the obstacle model. In this work, we propose a risk-adaptive approach based on the Conditional Value-at-Risk Barrier Function (CVaR-BF), where the risk level is automatically adjusted to accept the minimum necessary risk, achieving a good performance in terms of safety and optimization feasibility under uncertainty. Additionally, we introduce a dynamic zone-based barrier function which characterizes the collision likelihood by evaluating the relative state between the robot and the obstacle. By integrating risk adaptation with this new function, our approach adaptively expands the safety margin, enabling the robot to proactively avoid obstacles in highly dynamic environments. Comparisons and ablation studies demonstrate that our method outperforms existing social navigation approaches, and validate the effectiveness of our proposed framework.
ON as ALC: Active Loop Closing Object Goal Navigation
In simultaneous localization and mapping, active loop closing (ALC) is an active vision problem that aims to visually guide a robot to maximize the chances of revisiting previously visited points, thereby resetting the drift errors accumulated in the incrementally built map during travel. However, current mainstream navigation strategies that leverage such incomplete maps as workspace prior knowledge often fail in modern long-term autonomy long-distance travel scenarios where map accumulation errors become significant. To address these limitations of map-based navigation, this paper is the first to explore mapless navigation in the embodied AI field, in particular, to utilize object-goal navigation (commonly abbreviated as ON, ObjNav, or OGN) techniques that efficiently explore target objects without using such a prior map. Specifically, in this work, we start from an off-the-shelf mapless ON planner, extend it to utilize a prior map, and further show that the performance in long-distance ALC (LD-ALC) can be maximized by minimizing ``ALC loss" and ``ON loss". This study highlights a simple and effective approach, called ALC-ON (ALCON), to accelerate the progress of challenging long-distance ALC technology by leveraging the growing frontier-guided, data-driven, and LLM-guided ON technologies.
comment: Draft version of a conference paper with 7 pages, 5 figures, and 1 table
Learning Autonomy: Off-Road Navigation Enhanced by Human Input
In the area of autonomous driving, navigating off-road terrains presents a unique set of challenges, from unpredictable surfaces like grass and dirt to unexpected obstacles such as bushes and puddles. In this work, we present a novel learning-based local planner that addresses these challenges by directly capturing human driving nuances from real-world demonstrations using only a monocular camera. The key features of our planner are its ability to navigate in challenging off-road environments with various terrain types and its fast learning capabilities. By utilizing minimal human demonstration data (5-10 mins), it quickly learns to navigate in a wide array of off-road conditions. The local planner significantly reduces the real world data required to learn human driving preferences. This allows the planner to apply learned behaviors to real-world scenarios without the need for manual fine-tuning, demonstrating quick adjustment and adaptability in off-road autonomous driving technology.
F$^3$Loc: Fusion and Filtering for Floorplan Localization CVPR 2024
In this paper we propose an efficient data-driven solution to self-localization within a floorplan. Floorplan data is readily available, long-term persistent and inherently robust to changes in the visual appearance. Our method does not require retraining per map and location or demand a large database of images of the area of interest. We propose a novel probabilistic model consisting of an observation and a novel temporal filtering module. Operating internally with an efficient ray-based representation, the observation module consists of a single and a multiview module to predict horizontal depth from images and fuses their results to benefit from advantages offered by either methodology. Our method operates on conventional consumer hardware and overcomes a common limitation of competing methods that often demand upright images. Our full system meets real-time requirements, while outperforming the state-of-the-art by a significant margin.
comment: 10 pages, 11 figure, accepted to CVPR 2024 (fixed typo eq.8: s_x,s_y, s_phi -> x, y, phi)
Soft Arm-Motor Thrust Characterization for a Pneumatically Actuated Soft Morphing Quadrotor
In this work, an experimental characterization of the configuration space of a soft, pneumatically actuated morphing quadrotor is presented, with a focus on precise thrust characterization of its flexible arms, considering the effect of downwash. Unlike traditional quadrotors, the soft drone has pneumatically actuated arms, introducing complex, nonlinear interactions between motor thrust and arm deformation, which make precise control challenging. The silicone arms are actuated using differential pressure to achieve flexibility and thus have a variable workspace compared to their fixed counter-parts. The deflection of the soft arms during compression and expansion is controlled throughout the flight. However, in real time, the downwash from the motor attached at the tip of the soft arm generates a significant and random disturbance on the arm. This disturbance affects both the desired deflection of the arm and the overall stability of the system. To address this factor, an experimental characterization of the effect of downwash on the deflection angle of the arm is conducted.
comment: This extended abstract was accepted for RoboSoft Conference, 2025 but later withdrawn
VIMPPI: Enhancing Model Predictive Path Integral Control with Variational Integration for Underactuated Systems
This paper presents VIMPPI, a novel control approach for underactuated double pendulum systems developed for the AI Olympics competition. We enhance the Model Predictive Path Integral framework by incorporating variational integration techniques, enabling longer planning horizons without additional computational cost. Operating at 500-700 Hz with control interpolation and disturbance detection mechanisms, VIMPPI substantially outperforms both baseline methods and alternative MPPI implementations
Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans? ICRA2025
Designing planners and controllers for contact-rich manipulation is extremely challenging as contact violates the smoothness conditions that many gradient-based controller synthesis tools assume. Contact smoothing approximates a non-smooth system with a smooth one, allowing one to use these synthesis tools more effectively. However, applying classical control synthesis methods to smoothed contact dynamics remains relatively under-explored. This paper analyzes the efficacy of linear controller synthesis using differential simulators based on contact smoothing. We introduce natural baselines for leveraging contact smoothing to compute (a) open-loop plans robust to uncertain conditions and/or dynamics, and (b) feedback gains to stabilize around open-loop plans. Using robotic bimanual whole-body manipulation as a testbed, we perform extensive empirical experiments on over 300 trajectories and analyze why LQR seems insufficient for stabilizing contact-rich plans. The video summarizing this paper and hardware experiments is found here: https://youtu.be/HLaKi6qbwQg?si=_zCAmBBD6rGSitm9.
comment: ICRA2025
Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies
An end-to-end (E2E) visuomotor policy is typically treated as a unified whole, but recent approaches using out-of-domain (OOD) data to pretrain the visual encoder have cleanly separated the visual encoder from the network, with the remainder referred to as the policy. We propose Visual Alignment Testing, an experimental framework designed to evaluate the validity of this functional separation. Our results indicate that in E2E-trained models, visual encoders actively contribute to decision-making resulting from motor data supervision, contradicting the assumed functional separation. In contrast, OOD-pretrained models, where encoders lack this capability, experience an average performance drop of 42\% in our benchmark results, compared to the state-of-the-art performance achieved by E2E policies. We believe this initial exploration of visual encoders' role can provide a first step towards guiding future pretraining methods to address their decision-making ability, such as developing task-conditioned or context-aware encoders.
METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance ICRA
Multi-modal end-to-end autonomous driving has shown promising advancements in recent work. By embedding more modalities into end-to-end networks, the system's understanding of both static and dynamic aspects of the driving environment is enhanced, thereby improving the safety of autonomous driving. In this paper, we introduce METDrive, an end-to-end system that leverages temporal guidance from the embedded time series features of ego states, including rotation angles, steering, throttle signals, and waypoint vectors. The geometric features derived from perception sensor data and the time series features of ego state data jointly guide the waypoint prediction with the proposed temporal guidance loss function. We evaluated METDrive on the CARLA leaderboard benchmarks, achieving a driving score of 70%, a route completion score of 94%, and an infraction score of 0.78.
comment: Accepted by ICRA
UAV-VLPA*: A Vision-Language-Path-Action System for Optimal Route Generation on a Large Scales
The UAV-VLPA* (Visual-Language-Planning-and-Action) system represents a cutting-edge advancement in aerial robotics, designed to enhance communication and operational efficiency for unmanned aerial vehicles (UAVs). By integrating advanced planning capabilities, the system addresses the Traveling Salesman Problem (TSP) to optimize flight paths, reducing the total trajectory length by 18.5\% compared to traditional methods. Additionally, the incorporation of the A* algorithm enables robust obstacle avoidance, ensuring safe and efficient navigation in complex environments. The system leverages satellite imagery processing combined with the Visual Language Model (VLM) and GPT's natural language processing capabilities, allowing users to generate detailed flight plans through simple text commands. This seamless fusion of visual and linguistic analysis empowers precise decision-making and mission planning, making UAV-VLPA* a transformative tool for modern aerial operations. With its unmatched operational efficiency, navigational safety, and user-friendly functionality, UAV-VLPA* sets a new standard in autonomous aerial robotics, paving the way for future innovations in the field.
comment: arXiv admin note: text overlap with arXiv:2501.05014
Morphological-Symmetry-Equivariant Heterogeneous Graph Neural Network for Robotic Dynamics Learning
We present a morphological-symmetry-equivariant heterogeneous graph neural network, namely MS-HGNN, for robotic dynamics learning, that integrates robotic kinematic structures and morphological symmetries into a single graph network. These structural priors are embedded into the learning architecture as constraints, ensuring high generalizability, sample and model efficiency. The proposed MS-HGNN is a versatile and general architecture that is applicable to various multi-body dynamic systems and a wide range of dynamics learning problems. We formally prove the morphological-symmetry-equivariant property of our MS-HGNN and validate its effectiveness across multiple quadruped robot learning problems using both real-world and simulated data. Our code is made publicly available at https://github.com/lunarlab-gatech/MorphSym-HGNN/.
Dexterous Contact-Rich Manipulation via the Contact Trust Region
What is a good local description of contact dynamics for contact-rich manipulation, and where can we trust this local description? While many approaches often rely on the Taylor approximation of dynamics with an ellipsoidal trust region, we argue that such approaches are fundamentally inconsistent with the unilateral nature of contact. As a remedy, we present the Contact Trust Region (CTR), which captures the unilateral nature of contact while remaining efficient for computation. With CTR, we first develop a Model-Predictive Control (MPC) algorithm capable of synthesizing local contact-rich plans. Then, we extend this capability to plan globally by stitching together local MPC plans, enabling efficient and dexterous contact-rich manipulation. To verify the performance of our method, we perform comprehensive evaluations, both in high-fidelity simulation and on hardware, on two contact-rich systems: a planar IiwaBimanual system and a 3D AllegroHand system. On both systems, our method offers a significantly lower-compute alternative to existing RL-based approaches to contact-rich manipulation. In particular, our Allegro in-hand manipulation policy, in the form of a roadmap, takes fewer than 10 minutes to build offline on a standard laptop using just its CPU, with online inference taking just a few seconds. Experiment data, video and code are available at ctr.theaiinstitute.com.
RobotMover: Learning to Move Large Objects From Human Demonstrations
Moving large objects, such as furniture or appliances, is a critical capability for robots operating in human environments. This task presents unique challenges, including whole-body coordination to avoid collisions and managing the dynamics of bulky, heavy objects. In this work, we present RobotMover, a learning-based system for large object manipulation that uses human-object interaction demonstrations to train robot control policies. RobotMover formulates the manipulation problem as imitation learning using a simplified spatial representation called the Interaction Chain, which captures essential interaction dynamics in a way that generalizes across different robot bodies. We incorporate this Interaction Chain into a reward function and train policies in simulation using domain randomization to enable zero-shot transfer to real-world robots. The resulting policies allow a Spot robot to manipulate various large objects, including chairs, tables, and standing lamps. Through extensive experiments in both simulation and the real world, we show that RobotMover achieves strong performance in terms of capability, robustness, and controllability, outperforming both learned and teleoperation baselines. The system also supports practical applications by combining learned policies with simple planning modules to perform long-horizon object transport and rearrangement tasks.
CRADMap: Applied Distributed Volumetric Mapping with 5G-Connected Multi-Robots and 4D Radar Perception
Sparse and feature SLAM methods provide robust camera pose estimation. However, they often fail to capture the level of detail required for inspection and scene awareness tasks. Conversely, dense SLAM approaches generate richer scene reconstructions but impose a prohibitive computational load to create 3D maps. We present a novel distributed volumetric mapping framework designated as CRADMap that addresses these issues by extending the state-of-the-art (SOTA) ORBSLAM3 system with the COVINS on the backend for global optimization. Our pipeline for volumetric reconstruction fuses dense keyframes at a centralized server via 5G connectivity, aggregating geometry, and occupancy information from multiple autonomous mobile robots (AMRs) without overtaxing onboard resources. This enables each AMR to independently perform mapping while the backend constructs high-fidelity real-time 3D maps. To operate Beyond the Visible (BtV) and overcome the limitations of standard visual sensors, we automated a standalone 4D mmWave radar module that functions independently without sensor fusion with SLAM. The BtV system enables the detection and mapping of occluded metallic objects in cluttered environments, enhancing situational awareness in inspection scenarios. Experimental validation in Section~\ref{sec:IV} demonstrates the effectiveness of our framework.
comment: 7 pages, 5 figures, IEEE, ICARM
Physical synchronization of soft self-oscillating limbs for fast and autonomous locomotion
Animals achieve robust locomotion by offloading regulation from the brain to physical couplings within the body. In contrast, locomotion in artificial systems often depends on centralized processors. We introduce a rapid and autonomous locomotion strategy with synchronized gaits emerging through physical interactions between self-oscillating limbs and the environment, without control signals. Each limb is a single soft tube that only requires constant flow of air to perform cyclic stepping motions at frequencies reaching 300 hertz. By combining several of these self-oscillating limbs, their physical synchronization enables locomotion speeds that are orders of magnitude faster than comparable state-of-the-art. Through body-environment dynamics, these seemingly simple devices exhibit autonomy, including obstacle avoidance, amphibious gait transitions, and phototaxis.
On-Robot Reinforcement Learning with Goal-Contrastive Rewards
Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world. Unfortunately, RL can be prohibitively expensive, in terms of on-robot runtime, due to inefficient exploration when learning from a sparse reward signal. Designing dense reward functions is labour-intensive and requires domain expertise. In our work, we propose GCR (Goal-Contrastive Rewards), a dense reward function learning method that can be trained on passive video demonstrations. By using videos without actions, our method is easier to scale, as we can use arbitrary videos. GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories. We perform experiments in simulated manipulation environments across RoboMimic and MimicGen tasks, as well as in the real world using a Franka arm and a Spot quadruped. We find that GCR leads to a more-sample efficient RL, enabling model-free RL to solve about twice as many tasks as our baseline reward learning methods. We also demonstrate positive cross-embodiment transfer from videos of people and of other robots performing a task. Website: https://gcr-robot.github.io/.
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans.
comment: Code and videos at https://nicklashansen.com/rlpuppeteer
BiFlex: A Passive Bimodal Stiffness Flexible Wrist for Manipulation in Unstructured Environments
Robotic manipulation in unstructured, humancentric environments poses a dual challenge: achieving the precision need for delicate free-space operation while ensuring safety during unexpected contact events. Traditional wrists struggle to balance these demands, often relying on complex control schemes or complicated mechanical designs to mitigate potential damage from force overload. In response, we present BiFlex, a flexible robotic wrist that uses a soft buckling honeycomb structure to provides a natural bimodal stiffness response. The higher stiffness mode enables precise household object manipulation, while the lower stiffness mode provides the compliance needed to adapt to external forces. We design BiFlex to maintain a fingertip deflection of less than 1 cm while supporting loads up to 500g and create a BiFlex wrist for many grippers, including Panda, Robotiq, and BaRiFlex. We validate BiFlex under several real-world experimental evaluations, including surface wiping, precise pick-and-place, and grasping under environmental constraints. We demonstrate that BiFlex simplifies control while maintaining precise object manipulation and enhanced safety in real-world applications.
comment: 8 pages, 10 figures
Pitch-axis supermanoeuvrability in a biomimetic morphing-wing UAV
Birds and bats are extremely adept flyers: whether in hunting prey, or evading predators, post-stall manoeuvrability is a characteristic of vital importance. Their performance, in this regard, greatly exceeds that of uncrewed aerial vehicles (UAVs) of similar scale. Attempts to attain post-stall manoeuvrability, or supermanoeuvrability, in UAVs have typically focused on thrust-vectoring technology. Here we show that biomimetic wing morphing offers an additional pathway to classical supermanoeuvrability, as well as novel forms of bioinspired post-stall manoeuvrability. Using a state-of-the-art flight simulator, equipped with a multibody model of lifting surface motion and a delay differential equation (Goman-Khrabrov) dynamic stall model for all lifting surfaces, we demonstrate the capability of a biomimetic morphing-wing UAV for two post-stall manoeuvres: a classical rapid nose-pointing-and-shooting (RaNPAS) manoeuvre; and a wall landing manoeuvre inspired by biological ballistic transitions. We develop a guidance method for these manoeuvres, based on parametric variation of nonlinear longitudinal stability profiles, which allows efficient exploration of the space of post-stall manoeuvres in these types of UAVs; and yields insight into effective morphing kinematics to enable these manoeuvres. Our results demonstrate the capability of biomimetic morphing, and morphing control of nonlinear longitudinal stability, to enable advanced forms of transient supermanoeuvrability in UAVs.
Multiagent Systems
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Large Language Models (LLMs) are predominantly trained and aligned in ways that reinforce Western-centric epistemologies and socio-cultural norms, leading to cultural homogenization and limiting their ability to reflect global civilizational plurality. Existing benchmarking frameworks fail to adequately capture this bias, as they rely on rigid, closed-form assessments that overlook the complexity of cultural inclusivity. To address this, we introduce WorldView-Bench, a benchmark designed to evaluate Global Cultural Inclusivity (GCI) in LLMs by analyzing their ability to accommodate diverse worldviews. Our approach is grounded in the Multiplex Worldview proposed by Senturk et al., which distinguishes between Uniplex models, reinforcing cultural homogenization, and Multiplex models, which integrate diverse perspectives. WorldView-Bench measures Cultural Polarization, the exclusion of alternative perspectives, through free-form generative evaluation rather than conventional categorical benchmarks. We implement applied multiplexity through two intervention strategies: (1) Contextually-Implemented Multiplex LLMs, where system prompts embed multiplexity principles, and (2) Multi-Agent System (MAS)-Implemented Multiplex LLMs, where multiple LLM agents representing distinct cultural perspectives collaboratively generate responses. Our results demonstrate a significant increase in Perspectives Distribution Score (PDS) entropy from 13% at baseline to 94% with MAS-Implemented Multiplex LLMs, alongside a shift toward positive sentiment (67.7%) and enhanced cultural balance. These findings highlight the potential of multiplex-aware AI evaluation in mitigating cultural bias in LLMs, paving the way for more inclusive and ethically aligned AI systems.
comment: Preprint. Submitted to the Journal of Artificial Intelligence Research (JAIR) on April 29, 2025
Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps
Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.
Streaming Multi-agent Pathfinding IJCAI2025
The task of the multi-agent pathfinding (MAPF) problem is to navigate a team of agents from their start point to the goal points. However, this setup is unsuitable in the assembly line scenario, which is periodic with a long working hour. To address this issue, the study formalizes the streaming MAPF (S-MAPF) problem, which assumes that the agents in the same agent stream have a periodic start time and share the same action sequence. The proposed solution, Agent Stream Conflict-Based Search (ASCBS), is designed to tackle this problem by incorporating a cyclic vertex/edge constraint to handle conflicts. Additionally, this work explores the potential usage of the disjoint splitting strategy within ASCBS. Experimental results indicate that ASCBS surpasses traditional MAPF solvers in terms of runtime for scenarios with prolonged working hours.
comment: to be published in IJCAI2025
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners
The rapid rise of large language models (LLMs) has shifted artificial intelligence (AI) research toward agentic systems, motivating the use of weaker and more flexible notions of agency. However, this shift raises key questions about the extent to which LLM-based agents replicate human strategic reasoning, particularly in game-theoretic settings. In this context, we examine the role of agentic sophistication in shaping artificial reasoners' performance by evaluating three agent designs: a simple game-theoretic model, an unstructured LLM-as-agent model, and an LLM integrated into a traditional agentic framework. Using guessing games as a testbed, we benchmarked these agents against human participants across general reasoning patterns and individual role-based objectives. Furthermore, we introduced obfuscated game scenarios to assess agents' ability to generalise beyond training distributions. Our analysis, covering over 2000 reasoning samples across 25 agent configurations, shows that human-inspired cognitive structures can enhance LLM agents' alignment with human strategic behaviour. Still, the relationship between agentic design complexity and human-likeness is non-linear, highlighting a critical dependence on underlying LLM capabilities and suggesting limits to simple architectural augmentation.
SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation
Contemporary approaches to agent-based modeling (ABM) of social systems have traditionally emphasized rule-based behaviors, limiting their ability to capture nuanced dynamics by moving beyond predefined rules and leveraging contextual understanding from LMs of human social interaction. This paper presents SALM (Social Agent LM Framework), a novel approach for integrating language models (LMs) into social network simulation that achieves unprecedented temporal stability in multi-agent scenarios. Our primary contributions include: (1) a hierarchical prompting architecture enabling stable simulation beyond 4,000 timesteps while reducing token usage by 73%, (2) an attention-based memory system achieving 80% cache hit rates (95% CI [78%, 82%]) with sub-linear memory growth of 9.5%, and (3) formal bounds on personality stability. Through extensive validation against SNAP ego networks, we demonstrate the first LLM-based framework capable of modeling long-term social phenomena while maintaining empirically validated behavioral fidelity.
Chisme: Fully Decentralized Differentiated Deep Learning for Edge Intelligence
As demand for intelligent services rises and edge devices become more capable, distributed learning at the network edge has emerged as a key enabling technology. While existing paradigms like federated learning (FL) and decentralized FL (DFL) enable privacy-preserving distributed learning in many scenarios, they face potential challenges in connectivity and synchronization imposed by resource-constrained and infrastructure-less environments. While more robust, gossip learning (GL) algorithms have generally been designed for homogeneous data distributions and may not suit all contexts. This paper introduces Chisme, a novel suite of protocols designed to address the challenges of implementing robust intelligence in the network edge, characterized by heterogeneous data distributions, episodic connectivity, and lack of infrastructure. Chisme includes both synchronous DFL (Chisme-DFL) and asynchronous GL (Chisme-GL) variants that enable collaborative yet decentralized model training that considers underlying data heterogeneity. We introduce a data similarity heuristic that allows agents to opportunistically infer affinity with each other using the existing communication of model updates in decentralized FL and GL. We leverage the heuristic to extend DFL's model aggregation and GL's model merge mechanisms for better personalized training while maintaining collaboration. While Chisme-DFL is a synchronous decentralized approach whose resource utilization scales linearly with network size, Chisme-GL is fully asynchronous and has a lower, constant resource requirement independent of network size. We demonstrate that Chisme methods outperform their standard counterparts in model training over distributed and heterogeneous data in network scenarios ranging from less connected and reliable networks to fully connected and lossless networks.
comment: This work has been submitted to the IEEE for possible publication
Hamilton's Rule for Enabling Altruism in Multi-Agent Systems
This paper explores the application of Hamilton's rule to altruistic decision-making in multi-agent systems. Inspired by biological altruism, we introduce a framework that evaluates when individual agents should incur costs to benefit their neighbors. By adapting Hamilton's rule, we define agent ``fitness" in terms of task productivity rather than genetic survival. We formalize altruistic decision-making through a graph-based model of multi-agent interactions and propose a solution using collaborative control Lyapunov functions. The approach ensures that altruistic behaviors contribute to the collective goal-reaching efficiency of the system. We illustrate this framework on a multi-agent way-point navigation problem, where we show through simulation how agent importance levels influence altruistic decision-making, leading to improved coordination in navigation tasks.
On Signed Network Coordination Games
We study binary-action pairwise-separable network games that encompass both coordinating and anti-coordinating behaviors. Our model is grounded in an underlying directed signed graph, where each link is associated with a weight that describes the strenght and nature of the interaction. The utility for each agent is an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent 'cohesive' subset of players, who are either connected exclusively by positive weights, or forms a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Given the properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by a consensus or, respectively, a polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the supermodular properties of coordination games and on a novel use of the concept of graph cohesiveness.
comment: 14 pages, 8 figures
Community-based Multi-Agent Reinforcement Learning with Transfer and Active Exploration
We propose a new framework for multi-agent reinforcement learning (MARL), where the agents cooperate in a time-evolving network with latent community structures and mixed memberships. Unlike traditional neighbor-based or fixed interaction graphs, our community-based framework captures flexible and abstract coordination patterns by allowing each agent to belong to multiple overlapping communities. Each community maintains shared policy and value functions, which are aggregated by individual agents according to personalized membership weights. We also design actor-critic algorithms that exploit this structure: agents inherit community-level estimates for policy updates and value learning, enabling structured information sharing without requiring access to other agents' policies. Importantly, our approach supports both transfer learning by adapting to new agents or tasks via membership estimation, and active learning by prioritizing uncertain communities during exploration. Theoretically, we establish convergence guarantees under linear function approximation for both actor and critic updates. To our knowledge, this is the first MARL framework that integrates community structure, transferability, and active learning with provable guarantees.
The Hidden Bloat in Machine Learning Systems
Software bloat refers to code and features that is not used by a software during runtime. For Machine Learning (ML) systems, bloat is a major contributor to their technical debt leading to decreased performance and resource wastage. In this work, we present, Negativa-ML, a novel tool to identify and remove bloat in ML frameworks by analyzing their shared libraries. Our approach includes novel techniques to detect and locate unnecessary code within device code - a key area overlooked by existing research, which focuses primarily on host code. We evaluate Negativa-ML using four popular ML frameworks across ten workloads over 300 shared libraries. The results demonstrate that the ML frameworks are highly bloated on both the device and host code side. On average, Negativa-ML reduces the device code size in these frameworks by up to 75% and the host code by up to 72%, resulting in total file size reductions of up to 55%. The device code is a primary source of bloat within ML frameworks. Through debloating, we achieve reductions in peak host memory usage, peak GPU memory usage, and execution time by up to 74.6%, 69.6%, and 44.6%, respectively.
Intelligent Product 3.0: Decentralised AI Agents and Web3 Intelligence Standards
Twenty-five years ago, the specification of the Intelligent Product was established, envisaging real-time connectivity that not only enables products to gather accurate data about themselves but also allows them to assess and influence their own destiny. Early work by the Auto-ID project focused on creating a single, open-standard repository for storing and retrieving product information, laying a foundation for scalable connectivity. A decade later, the approach was revisited in light of low-cost RFID systems that promised a low-cost link between physical goods and networked information environments. Since then, advances in blockchain, Web3, and artificial intelligence have introduced unprecedented levels of resilience, consensus, and autonomy. By leveraging decentralised identity, blockchain-based product information and history, and intelligent AI-to-AI collaboration, this paper examines these developments and outlines a new specification for the Intelligent Product 3.0, illustrating how decentralised and AI-driven capabilities facilitate seamless interaction between physical AI and everyday products.
comment: 18 pages, 1 Figure, 3 Tables; Corrected typo in Section 3.4 heading
Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis AAMAS'25
Climate policy development faces significant challenges due to deep uncertainty, complex system dynamics, and competing stakeholder interests. Climate simulation methods, such as Earth System Models, have become valuable tools for policy exploration. However, their typical use is for evaluating potential polices, rather than directly synthesizing them. The problem can be inverted to optimize for policy pathways, but the traditional optimization approaches often struggle with non-linear dynamics, heterogeneous agents, and comprehensive uncertainty quantification. We propose a framework for augmenting climate simulations with Multi-Agent Reinforcement Learning (MARL) to address these limitations. We identify key challenges at the interface between climate simulations and the application of MARL in the context of policy synthesis, including reward definition, scalability with increasing agents and state spaces, uncertainty propagation across linked systems, and solution validation. Additionally, we discuss challenges in making MARL-derived solutions interpretable and useful for policy-makers. Our framework provides a foundation for more sophisticated climate policy exploration while acknowledging important limitations and areas for future research.
comment: Published in AAMAS'25 Blue Sky Ideas Track
Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation
Cognitive abilities, such as Theory of Mind (ToM), play a vital role in facilitating cooperation in human social interactions. However, our study reveals that agents with higher ToM abilities may not necessarily exhibit better cooperative behavior compared to those with lower ToM abilities. To address this challenge, we propose a novel matching coalition mechanism that leverages the strengths of agents with different ToM levels by explicitly considering belief alignment and specialized abilities when forming coalitions. Our proposed matching algorithm seeks to find stable coalitions that maximize the potential for cooperative behavior and ensure long-term viability. By incorporating cognitive insights into the design of multi-agent systems, our work demonstrates the potential of leveraging ToM to create more sophisticated and human-like coordination strategies that foster cooperation and improve overall system performance.
Centrally Coordinated Multi-Agent Reinforcement Learning for Power Grid Topology Control
Power grid operation is becoming more complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. However, the combinatorial nature of the action space poses a challenge to both conventional optimizers and learned controllers. Action space factorization, which breaks down decision-making into smaller sub-tasks, is one approach to tackle the curse of dimensionality. In this study, we propose a centrally coordinated multi-agent (CCMA) architecture for action space factorization. In this approach, regional agents propose actions and subsequently a coordinating agent selects the final action. We investigate several implementations of the CCMA architecture, and benchmark in different experimental settings against various L2RPN baseline approaches. The CCMA architecture exhibits higher sample efficiency and superior final performance than the baseline approaches. The results suggest high potential of the CCMA approach for further application in higher-dimensional L2RPN as well as real-world power grid settings.
comment: Accepted version to The 16th ACM International Conference on Future and Sustainable Energy Systems. The final published version is available at 10.1145/3679240.3734602
Systems and Control (CS)
Regulation without calibration
This article revisits the importance of the internal model principle in the literature of regulation and synchronization. Trajectory regulation, the task of regulating continuous-time signals generated by differential equations, is contrasted with event regulation, the task of only regulating discrete events associated with the trajectories. In trajectory regulation, the internal model principle requires an exact internal generator of the continuous-time trajectories, which translates into unrealistic calibration requirements. Event regulation is envisioned as a way to relieve calibration of the continuous behavior while ensuring reliability of the discrete events.
comment: Submitted to IEEE Control Systems Magazine
Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps
Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.
Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments
This work extends our prior work on the distributed nonlinear model predictive control (NMPC) for navigating a robot fleet following a certain flocking behavior in unknown obstructed environments with a more realistic local obstacle avoidance strategy. More specifically, we integrate the local obstacle avoidance constraint using point clouds into the NMPC framework. Here, each agent relies on data from its local sensor to perceive and respond to nearby obstacles. A point cloud processing technique is presented for both two-dimensional and three-dimensional point clouds to minimize the computational burden during the optimization. The process consists of directional filtering and down-sampling that significantly reduce the number of data points. The algorithm's performance is validated through realistic 3D simulations in Gazebo, and its practical feasibility is further explored via hardware-in-the-loop (HIL) simulations on embedded platforms.
comment: 22 pages, 14 figures, to be published in Frontiers in Robotics and AI
Improved Corner Cutting Constraints for Mixed-Integer Motion Planning of a Differential Drive Micro-Mobility Vehicle
This paper addresses the problem of motion planning for differential drive micro-mobility platforms. This class of vehicle is designed to perform small-distance transportation of passengers and goods in structured environments. Our approach leverages mixed-integer linear programming (MILP) to compute global optimal collision-free trajectories taking into account the kinematics and dynamics of the vehicle. We propose novel constraints for intersample collision avoidance and demonstrate its effectiveness using pick-up and delivery missions and statistical analysis of Monte Carlo simulations. The results show that the novel formulation provides the best trajectories in terms of time expenditure and control effort when compared to two state-of-the-art approaches.
Coordinated Multi-Valve Disturbance-Rejection Pressure Control for High-Altitude Test Stands via Exterior Penalty Functions
High altitude simulation test benches for aero engines employ multi chamber, multi valve intake systems that demand effective decoupling and strong disturbance rejection during transient tests. This paper proposes a coordinated active disturbance rejection control (ADRC) scheme based on an external penalty function. The chamber pressure safety limit is reformulated as an inequality constrained optimization problem, and an exponential penalty together with a gradient based algorithm is designed for dynamic constraint relaxation, with global convergence rigorously proven. A coordination term is then integrated into a distributed ADRC framework to yield a multi valve coordinated LADRC controller, whose asymptotic stability is established via Lyapunov theory. Hardware in the loop simulations using MATLAB/Simulink and a PLC demonstrate that, under $\pm$3 kPa pressure constraints, chamber V2's maximum error is 1.782 kPa (77.1\% lower than PID control), and under a 180 kg/s^2 flow rate disturbance, valve oscillations decrease from $\pm$27\% to $\pm$5\% (an 81.5\% reduction). These results confirm the proposed method's superior disturbance rejection and decoupling performance.
Adaptive control for multi-scale stochastic dynamical systems with stochastic next generation reservoir computing
The rapid advancement of neuroscience and machine learning has established data-driven stochastic dynamical system modeling as a powerful tool for understanding and controlling high-dimensional, spatio-temporal processes. We introduce the stochastic next-generation reservoir computing (NG-RC) controller, a framework that integrates the computational efficiency of NG-RC with stochastic analysis to enable robust event-triggered control in multiscale stochastic systems. The asymptotic stability of the controller is rigorously proven via an extended stochastic LaSalle theorem, providing theoretical guarantees for amplitude regulation in nonlinear stochastic dynamics. Numerical experiments on a stochastic Van-der-Pol system subject to both additive and multiplicative noise validate the algorithm, demonstrating its convergence rate across varying temporal scales and noise intensities. To bridge theoretical insights with real-world applications, we deploy the controller to modulate pathological dynamics reconstructed from epileptic EEG data. This work advances a theoretically guaranteed scalable framework for adaptive control of stochastic systems, with broad potential for data-driven decision making in engineering, neuroscience, and beyond.
comment: 30 pages, 14 figures
Data-driven Internal Model Control for Output Regulation
Output regulation is a fundamental problem in control theory, extensively studied since the 1970s. Traditionally, research has primarily addressed scenarios where the system model is explicitly known, leaving the problem in the absence of a system model less explored. Leveraging the recent advancements in Willems et al.'s fundamental lemma, data-driven control has emerged as a powerful tool for stabilizing unknown systems. This paper tackles the output regulation problem for unknown single and multi-agent systems (MASs) using noisy data. Previous approaches have attempted to solve data-based output regulation equations (OREs), which are inadequate for achieving zero tracking error with noisy data. To circumvent the need for solving data-based OREs, we propose an internal model-based data-driven controller that reformulates the output regulation problem into a stabilization problem. This method is first applied to linear time-invariant (LTI) systems, demonstrating exact solution capabilities, i.e., zero tracking error, through solving a straightforward data-based linear matrix inequality (LMI). Furthermore, we extend our approach to solve the $k$th-order output regulation problem for nonlinear systems. Extensions to both linear and nonlinear MASs are discussed. Finally, numerical tests validate the effectiveness and correctness of the proposed controllers.
Some Computational Tools for Solving a Selection of Problems in Control Theory
This paper demonstrates how certified computational tools can be used to address various problems in control theory. In particular, we introduce PACE.jl, a Julia package that implements symbolic elimination techniques, including (among others) discriminant varieties and Rational Univariate Representation, while also supporting multi-precision interval computations. We showcase its applications to key control theory problems, including identification, stability analysis, and optimization, for both parameter-dependent and parameter-free systems.
Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.
comment: 12 pages, 15 figures
Model Identification Adaptive Control with $ρ$-POMDP Planning
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design (IID) and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($\rho$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.
comment: Accepted to CoDIT 2025
Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability
In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
comment: 10 pages, 2 figures
Reach-Avoid-Stabilize Using Admissible Control Sets
Hamilton-Jacobi Reachability (HJR) analysis has been successfully used in many robotics and control tasks, and is especially effective in computing reach-avoid sets and control laws that enable an agent to reach a goal while satisfying state constraints. However, the original HJR formulation provides no guarantees of safety after a) the prescribed time horizon, or b) goal satisfaction. The reach-avoid-stabilize (RAS) problem has therefore gained a lot of focus: find the set of initial states (the RAS set), such that the trajectory can reach the target, and stabilize to some point of interest (POI) while avoiding obstacles. Solving RAS problems using HJR usually requires defining a new value function, whose zero sub-level set is the RAS set. The existing methods do not consider the problem when there are a series of targets to reach and/or obstacles to avoid. We propose a method that uses the idea of admissible control sets; we guarantee that the system will reach each target while avoiding obstacles as prescribed by the given time series. Moreover, we guarantee that the trajectory ultimately stabilizes to the POI. The proposed method provides an under-approximation of the RAS set, guaranteeing safety. Numerical examples are provided to validate the theory.
comment: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control
``Sim2real gap", in which the system learned in simulations is not the exact representation of the real system, can lead to loss of stability and performance when controllers learned using data from the simulated system are used on the real system. In this work, we address this challenge in the linear quadratic regulator (LQR) setting. Specifically, we consider an LQR problem for a system with unknown system matrices. Along with the state-action pairs from the system to be controlled, a trajectory of length $S$ of state-action pairs from a different unknown system is available. Our proposed algorithm is constructed upon Thompson sampling and utilizes the mean as well as the uncertainty of the dynamics of the system from which the trajectory of length $S$ is obtained. We establish that the algorithm achieves $\tilde{\mathcal{O}}({f(S,M_{\delta})\sqrt{T/S}})$ Bayes regret after $T$ time steps, where $M_{\delta}$ characterizes the \emph{dissimilarity} between the two systems and $f(S,M_{\delta})$ is a function of $S$ and $M_{\delta}$. When $M_{\delta}$ is sufficiently small, the proposed algorithm achieves $\tilde{\mathcal{O}}({\sqrt{T/S}})$ Bayes regret and outperforms a naive strategy which does not utilize the available trajectory.
Hamilton's Rule for Enabling Altruism in Multi-Agent Systems
This paper explores the application of Hamilton's rule to altruistic decision-making in multi-agent systems. Inspired by biological altruism, we introduce a framework that evaluates when individual agents should incur costs to benefit their neighbors. By adapting Hamilton's rule, we define agent ``fitness" in terms of task productivity rather than genetic survival. We formalize altruistic decision-making through a graph-based model of multi-agent interactions and propose a solution using collaborative control Lyapunov functions. The approach ensures that altruistic behaviors contribute to the collective goal-reaching efficiency of the system. We illustrate this framework on a multi-agent way-point navigation problem, where we show through simulation how agent importance levels influence altruistic decision-making, leading to improved coordination in navigation tasks.
Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses
State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to static decoder boundaries. Goal: We introduce the Reviewer, a 3D visual interface projecting EMG signals directly into the decoder's classification space, providing intuitive, real-time insight into PR algorithm behavior. This structured feedback reduces cognitive load and fosters mutual, data-driven adaptation between user-generated EMG patterns and decoder boundaries. Methods: A 10-session study with 12 able-bodied participants compared PR performance after motor-based training and updating using the Reviewer versus conventional virtual arm visualization. Performance was assessed using a Fitts law task that involved the aperture of the cursor and the control of orientation. Results: Participants trained with the Reviewer achieved higher completion rates, reduced overshoot, and improved path efficiency and throughput compared to the standard visualization group. Significance: The Reviewer introduces decoder-informed motor training, facilitating immediate and consistent PR-based myoelectric control improvements. By iteratively refining control through real-time feedback, this approach reduces reliance on trial-and-error recalibration, enabling a more adaptive, self-correcting training framework. Conclusion: The 3D visual feedback significantly improves PR control in novice operators through structured training, enabling feedback-driven adaptation and reducing reliance on extensive heuristic adjustments.
Measuring Flexibility through Reduction Potential
While electric vehicles (EVs) often exhibit substantial flexibility, harnessing this flexibility requires precise characterization of its timing and magnitude. This paper introduces the reduction potential matrix, a novel approach to EV load flexibility modeling which is both straightforward to calculate and intuitive to interpret. This paper demonstrates the approach by quantifying flexibility for two distinct commercial vehicle groups--freight vehicles and transit buses--using simulated charging data from Virginia. While both groups are found to have substantial flexibility, its properties vary across the groups. Naturally, this variability manifests in differences in each group's role as a grid resource. The paper concludes with a discussion on how system planners, fleet operators, and other stakeholders can use the matrix to assess and leverage EV flexibility.
comment: 5 pages, 4 figures
On Signed Network Coordination Games
We study binary-action pairwise-separable network games that encompass both coordinating and anti-coordinating behaviors. Our model is grounded in an underlying directed signed graph, where each link is associated with a weight that describes the strenght and nature of the interaction. The utility for each agent is an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent 'cohesive' subset of players, who are either connected exclusively by positive weights, or forms a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Given the properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by a consensus or, respectively, a polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the supermodular properties of coordination games and on a novel use of the concept of graph cohesiveness.
comment: 14 pages, 8 figures
Risk-Aware Safe Reinforcement Learning for Control of Stochastic Linear Systems
This paper presents a risk-aware safe reinforcement learning (RL) control design for stochastic discrete-time linear systems. Rather than using a safety certifier to myopically intervene with the RL controller, a risk-informed safe controller is also learned besides the RL controller, and the RL and safe controllers are combined together. Several advantages come along with this approach: 1) High-confidence safety can be certified without relying on a high-fidelity system model and using limited data available, 2) Myopic interventions and convergence to an undesired equilibrium can be avoided by deciding on the contribution of two stabilizing controllers, and 3) highly efficient and computationally tractable solutions can be provided by optimizing over a scalar decision variable and linear programming polyhedral sets. To learn safe controllers with a large invariant set, piecewise affine controllers are learned instead of linear controllers. To this end, the closed-loop system is first represented using collected data, a decision variable, and noise. The effect of the decision variable on the variance of the safe violation of the closed-loop system is formalized. The decision variable is then designed such that the probability of safety violation for the learned closed-loop system is minimized. It is shown that this control-oriented approach reduces the data requirements and can also reduce the variance of safety violations. Finally, to integrate the safe and RL controllers, a new data-driven interpolation technique is introduced. This method aims to maintain the RL agent's optimal implementation while ensuring its safety within environments characterized by noise. The study concludes with a simulation example that serves to validate the theoretical results.
comment: Submitted to Asian Journal of Control
Communication-Efficient Distributed Online Nonconvex Optimization with Time-Varying Constraints
This paper considers distributed online nonconvex optimization with time-varying inequality constraints over a network of agents, where the nonconvex local loss and convex local constraint functions can vary arbitrarily across iterations, and the information of them is privately revealed to each agent at each iteration. For a uniformly jointly strongly connected time-varying directed graph, we propose two distributed bandit online primal--dual algorithm with compressed communication to efficiently utilize communication resources in the one-point and two-point bandit feedback settings, respectively. In nonconvex optimization, finding a globally optimal decision is often NP-hard. As a result, the standard regret metric used in online convex optimization becomes inapplicable. To measure the performance of the proposed algorithms, we use a network regret metric grounded in the first-order optimality condition associated with the variational inequality. We show that the compressed algorithms establish sublinear network regret and cumulative constraint violation bounds. Finally, a simulation example is presented to validate the theoretical results.
comment: 56 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2503.22410
Interval reduced-order switched positive observers for uncertain switched positive linear systems
In this paper, existence conditions and a design procedure of reduced-order switched positive observers for continuous- and discrete-time switched positive linear systems with uncertainty are established. In the analyzed class, arbitrary switching is permitted, whereas the uncertainty expressed via matrix inequalities concerns both the initial state and system parameters. Positive lower and positive upper interval switched observers are obtained. The proposed observers are of (n - p) order, where n is the dimension of the state vector and p is the rank of the output matrix, i.e., p-dimensional measurement information. Moreover, as a special case, existence conditions and a design procedure of reduced-order positive observers for uncertain positive linear systems without switching are provided. The theoretical findings are illustrated by two numerical examples for continuous- and discrete-time systems.
comment: 11 pages, 4 figures
Guaranteed Rejection-free Sampling Method Using Past Behaviours for Motion Planning of Autonomous Systems
The paper presents a novel learning-based sampling strategy that guarantees rejection-free sampling of the free space under both biased and approximately uniform conditions, leveraging multivariate kernel densities. Historical data from a given autonomous system is leveraged to estimate a non-parametric probabilistic description of the domain, which also describes the free space where feasible solutions of the motion planning problem are likely to be found. The tuning parameters of the kernel density estimator, the bandwidth and the kernel, are used to alter the description of the free space so that no samples can fall outside the originally defined space.The proposed method is demonstrated in two real-life case studies: An autonomous surface vessel (2D) and an autonomous drone (3D). Two planning problems are solved, showing that the proposed approximately uniform sampling scheme is capable of guaranteeing rejection-free samples of the considered workspace. Furthermore, the effectiveness of the proposed method is statistically validated using Monte Carlo simulations.
comment: Accepted for publication in Robotics and Autonomous Systems
Hybrid Resolver Model Generalization for Fault Condition Modeling: A Promising Tool for Reliability Study
Resolvers, like all electromagnetic devices, are constantly under investigation, both operationally and structurally. In this regard, proposing a modeling methodology that can save significant time without compromising accuracy is a big honor. In this study, a generalized hybrid model is suggested that, in addition to the above benefits, has sufficient capability to ease reliability study in the field of resolvers, where a large number of faulty conditions must be investigated under different operating conditions, including changes in angular velocity, voltage, and frequency of excitation; all of which are highlighted in the context of fault coverage. This model also serves as a promising tool for generating large datasets, which is advantageous for fault diagnosis. A resolver with a non-uniform air gap is chosen as a case study to challenge the suggested model, particularly in relation to eccentricity faults. We generalize the suggested model to account for the most common faulty conditions of resolvers: in-turn short circuits in signal and excitation windings, as well as static and dynamic eccentricity faults. The close agreement between the results of the suggested model and those from Time-Stepping Finite Element Analysis (TS-FEA), along with significant time savings in both healthy and faulty conditions, highlights the generality and proficiency of the suggested model. Finally, the case study is prototyped, and we verify the accuracy of the suggested model experimentally.
Kernel-based error bounds of bilinear Koopman surrogate models for nonlinear data-driven control
We derive novel deterministic bounds on the approximation error of data-based bilinear surrogate models for unknown nonlinear systems. The surrogate models are constructed using kernel-based extended dynamic mode decomposition to approximate the Koopman operator in a reproducing kernel Hilbert space. Unlike previous methods that require restrictive assumptions on the invariance of the dictionary, our approach leverages kernel-based dictionaries that allow us to control the projection error via pointwise error bounds, overcoming a significant limitation of existing theoretical guarantees. The derived state- and input-dependent error bounds allow for direct integration into Koopman-based robust controller designs with closed-loop guarantees for the unknown nonlinear system. Numerical examples illustrate the effectiveness of the proposed framework.
On Using Curved Mirrors to Decrease Shadowing in VLC
Visible light communication (VLC) complements radio frequency in indoor environments with large wireless data traffic. However, VLC is hindered by dramatic path losses when an opaque object is interposed between the transmitter and the receiver. Prior works propose the use of plane mirrors as optical reconfigurable intelligent surfaces (ORISs) to enhance communications through non-line-of-sight links. Plane mirrors rely on their orientation to forward the light to the target user location, which is challenging to implement in practice. This paper studies the potential of curved mirrors as static reflective surfaces to provide a broadening specular reflection that increases the signal coverage in mirror-assisted VLC scenarios. We study the behavior of paraboloid and semi-spherical mirrors and derive the irradiance equations. We provide extensive numerical and analytical results and show that curved mirrors, when developed with proper dimensions, may reduce the shadowing probability to zero, while static plane mirrors of the same size have shadowing probabilities larger than 65%. Furthermore, the signal-to-noise ratio offered by curved mirrors may suffice to provide connectivity to users deployed in the room even when a line-of-sight link blockage occurs.
comment: Accepted to be published in IEEE Globecom 2024
Soft Arm-Motor Thrust Characterization for a Pneumatically Actuated Soft Morphing Quadrotor
In this work, an experimental characterization of the configuration space of a soft, pneumatically actuated morphing quadrotor is presented, with a focus on precise thrust characterization of its flexible arms, considering the effect of downwash. Unlike traditional quadrotors, the soft drone has pneumatically actuated arms, introducing complex, nonlinear interactions between motor thrust and arm deformation, which make precise control challenging. The silicone arms are actuated using differential pressure to achieve flexibility and thus have a variable workspace compared to their fixed counter-parts. The deflection of the soft arms during compression and expansion is controlled throughout the flight. However, in real time, the downwash from the motor attached at the tip of the soft arm generates a significant and random disturbance on the arm. This disturbance affects both the desired deflection of the arm and the overall stability of the system. To address this factor, an experimental characterization of the effect of downwash on the deflection angle of the arm is conducted.
comment: This extended abstract was accepted for RoboSoft Conference, 2025 but later withdrawn
A Spectrum-based Filter Design for Periodic Control of Systems with Time Delay
A fully analytical controller design is proposed to tackle a periodic control problem for stable linear systems with an input delay. Applying the internal model control scheme, the controller design reduces to designing a filter, which is done through the placement of poles and zeros. The zeros are placed to compensate for the harmonics and to achieve the desired degree of properness for the filter. For placing the poles, a quasi-optimal procedure is proposed utilizing the standard LQR method. Given the high-dimensionality of the filter due to targeting a large number of harmonics, the design, as well as controller implementation, is performed over a state-space representation. A thorough experimental case study is included to demonstrate both the practical feasibility and effectiveness of the proposed control design. The experimental validation is performed on a physical system, the goal of which is to reject periodic vibrations acting on a mass-spring-damper setup where the sensor and the actuator are non-collocated.
comment: 25 pages, 10 figures, accepted 15 January 2025, Available online 22 January 2025, Version of Record 28 January 2025
VIMPPI: Enhancing Model Predictive Path Integral Control with Variational Integration for Underactuated Systems
This paper presents VIMPPI, a novel control approach for underactuated double pendulum systems developed for the AI Olympics competition. We enhance the Model Predictive Path Integral framework by incorporating variational integration techniques, enabling longer planning horizons without additional computational cost. Operating at 500-700 Hz with control interpolation and disturbance detection mechanisms, VIMPPI substantially outperforms both baseline methods and alternative MPPI implementations
Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans? ICRA2025
Designing planners and controllers for contact-rich manipulation is extremely challenging as contact violates the smoothness conditions that many gradient-based controller synthesis tools assume. Contact smoothing approximates a non-smooth system with a smooth one, allowing one to use these synthesis tools more effectively. However, applying classical control synthesis methods to smoothed contact dynamics remains relatively under-explored. This paper analyzes the efficacy of linear controller synthesis using differential simulators based on contact smoothing. We introduce natural baselines for leveraging contact smoothing to compute (a) open-loop plans robust to uncertain conditions and/or dynamics, and (b) feedback gains to stabilize around open-loop plans. Using robotic bimanual whole-body manipulation as a testbed, we perform extensive empirical experiments on over 300 trajectories and analyze why LQR seems insufficient for stabilizing contact-rich plans. The video summarizing this paper and hardware experiments is found here: https://youtu.be/HLaKi6qbwQg?si=_zCAmBBD6rGSitm9.
comment: ICRA2025
Triple-identity Authentication: The Future of Secure Access
In a typical authentication process, the local system verifies the user's identity using a stored hash value generated by a cross-system hash algorithm. This article shifts the research focus from traditional password encryption to the establishment of gatekeeping mechanisms for effective interactions between a system and the outside world. Here, we propose a triple-identity authentication system to achieve this goal. Specifically, this local system opens the inner structure of its hash algorithm to all user credentials, including the login name, login password, and authentication password. When a login credential is entered, the local system hashes it and then creates a unique identifier using intermediate hash elements randomly selected from the open algorithm. Importantly, this locally generated unique identifier (rather than the stored hash produced by the open algorithm) is utilized to verify the user's combined identity, which is generated by combining the entered credential with the International Mobile Equipment Identity and the International Mobile Subscriber Identity. The verification process is implemented at each interaction point: the login name field, the login password field, and the server's authentication point. Thus, within the context of this triple-identity authentication system, we establish a robust gatekeeping mechanism for system interactions, ultimately providing a level of security that is equivalent to multi-factor authentication.
comment: 10 pages, 2 figures,
Correct-by-construction requirement decomposition
In systems engineering, accurately decomposing requirements is crucial for creating well-defined and manageable system components, particularly in safety-critical domains. Despite the critical need, rigorous, top-down methodologies for effectively breaking down complex requirements into precise, actionable sub-requirements are scarce, especially compared to the wealth of bottom-up verification techniques. Addressing this gap, we introduce a formal decomposition for contract-based design that guarantees the correctness of decomposed requirements if specific conditions are met. Our (semi-)automated methodology augments contract-based design with reachability analysis and constraint programming to systematically identify, verify, and validate sub-requirements representable by continuous bounded sets -- continuous relations between real-valued inputs and outputs. We demonstrate the efficacy and practicality of a correct-by-construction approach through a comprehensive case study on a cruise control system, highlighting how our methodology improves the interpretability, tractability, and verifiability of system requirements.
Systems and Control (EESS)
Regulation without calibration
This article revisits the importance of the internal model principle in the literature of regulation and synchronization. Trajectory regulation, the task of regulating continuous-time signals generated by differential equations, is contrasted with event regulation, the task of only regulating discrete events associated with the trajectories. In trajectory regulation, the internal model principle requires an exact internal generator of the continuous-time trajectories, which translates into unrealistic calibration requirements. Event regulation is envisioned as a way to relieve calibration of the continuous behavior while ensuring reliability of the discrete events.
comment: Submitted to IEEE Control Systems Magazine
Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps
Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.
Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments
This work extends our prior work on the distributed nonlinear model predictive control (NMPC) for navigating a robot fleet following a certain flocking behavior in unknown obstructed environments with a more realistic local obstacle avoidance strategy. More specifically, we integrate the local obstacle avoidance constraint using point clouds into the NMPC framework. Here, each agent relies on data from its local sensor to perceive and respond to nearby obstacles. A point cloud processing technique is presented for both two-dimensional and three-dimensional point clouds to minimize the computational burden during the optimization. The process consists of directional filtering and down-sampling that significantly reduce the number of data points. The algorithm's performance is validated through realistic 3D simulations in Gazebo, and its practical feasibility is further explored via hardware-in-the-loop (HIL) simulations on embedded platforms.
comment: 22 pages, 14 figures, to be published in Frontiers in Robotics and AI
Improved Corner Cutting Constraints for Mixed-Integer Motion Planning of a Differential Drive Micro-Mobility Vehicle
This paper addresses the problem of motion planning for differential drive micro-mobility platforms. This class of vehicle is designed to perform small-distance transportation of passengers and goods in structured environments. Our approach leverages mixed-integer linear programming (MILP) to compute global optimal collision-free trajectories taking into account the kinematics and dynamics of the vehicle. We propose novel constraints for intersample collision avoidance and demonstrate its effectiveness using pick-up and delivery missions and statistical analysis of Monte Carlo simulations. The results show that the novel formulation provides the best trajectories in terms of time expenditure and control effort when compared to two state-of-the-art approaches.
Coordinated Multi-Valve Disturbance-Rejection Pressure Control for High-Altitude Test Stands via Exterior Penalty Functions
High altitude simulation test benches for aero engines employ multi chamber, multi valve intake systems that demand effective decoupling and strong disturbance rejection during transient tests. This paper proposes a coordinated active disturbance rejection control (ADRC) scheme based on an external penalty function. The chamber pressure safety limit is reformulated as an inequality constrained optimization problem, and an exponential penalty together with a gradient based algorithm is designed for dynamic constraint relaxation, with global convergence rigorously proven. A coordination term is then integrated into a distributed ADRC framework to yield a multi valve coordinated LADRC controller, whose asymptotic stability is established via Lyapunov theory. Hardware in the loop simulations using MATLAB/Simulink and a PLC demonstrate that, under $\pm$3 kPa pressure constraints, chamber V2's maximum error is 1.782 kPa (77.1\% lower than PID control), and under a 180 kg/s^2 flow rate disturbance, valve oscillations decrease from $\pm$27\% to $\pm$5\% (an 81.5\% reduction). These results confirm the proposed method's superior disturbance rejection and decoupling performance.
Adaptive control for multi-scale stochastic dynamical systems with stochastic next generation reservoir computing
The rapid advancement of neuroscience and machine learning has established data-driven stochastic dynamical system modeling as a powerful tool for understanding and controlling high-dimensional, spatio-temporal processes. We introduce the stochastic next-generation reservoir computing (NG-RC) controller, a framework that integrates the computational efficiency of NG-RC with stochastic analysis to enable robust event-triggered control in multiscale stochastic systems. The asymptotic stability of the controller is rigorously proven via an extended stochastic LaSalle theorem, providing theoretical guarantees for amplitude regulation in nonlinear stochastic dynamics. Numerical experiments on a stochastic Van-der-Pol system subject to both additive and multiplicative noise validate the algorithm, demonstrating its convergence rate across varying temporal scales and noise intensities. To bridge theoretical insights with real-world applications, we deploy the controller to modulate pathological dynamics reconstructed from epileptic EEG data. This work advances a theoretically guaranteed scalable framework for adaptive control of stochastic systems, with broad potential for data-driven decision making in engineering, neuroscience, and beyond.
comment: 30 pages, 14 figures
Data-driven Internal Model Control for Output Regulation
Output regulation is a fundamental problem in control theory, extensively studied since the 1970s. Traditionally, research has primarily addressed scenarios where the system model is explicitly known, leaving the problem in the absence of a system model less explored. Leveraging the recent advancements in Willems et al.'s fundamental lemma, data-driven control has emerged as a powerful tool for stabilizing unknown systems. This paper tackles the output regulation problem for unknown single and multi-agent systems (MASs) using noisy data. Previous approaches have attempted to solve data-based output regulation equations (OREs), which are inadequate for achieving zero tracking error with noisy data. To circumvent the need for solving data-based OREs, we propose an internal model-based data-driven controller that reformulates the output regulation problem into a stabilization problem. This method is first applied to linear time-invariant (LTI) systems, demonstrating exact solution capabilities, i.e., zero tracking error, through solving a straightforward data-based linear matrix inequality (LMI). Furthermore, we extend our approach to solve the $k$th-order output regulation problem for nonlinear systems. Extensions to both linear and nonlinear MASs are discussed. Finally, numerical tests validate the effectiveness and correctness of the proposed controllers.
Some Computational Tools for Solving a Selection of Problems in Control Theory
This paper demonstrates how certified computational tools can be used to address various problems in control theory. In particular, we introduce PACE.jl, a Julia package that implements symbolic elimination techniques, including (among others) discriminant varieties and Rational Univariate Representation, while also supporting multi-precision interval computations. We showcase its applications to key control theory problems, including identification, stability analysis, and optimization, for both parameter-dependent and parameter-free systems.
Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.
comment: 12 pages, 15 figures
Model Identification Adaptive Control with $ρ$-POMDP Planning
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design (IID) and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($\rho$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.
comment: Accepted to CoDIT 2025
Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability
In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
comment: 10 pages, 2 figures
Reach-Avoid-Stabilize Using Admissible Control Sets
Hamilton-Jacobi Reachability (HJR) analysis has been successfully used in many robotics and control tasks, and is especially effective in computing reach-avoid sets and control laws that enable an agent to reach a goal while satisfying state constraints. However, the original HJR formulation provides no guarantees of safety after a) the prescribed time horizon, or b) goal satisfaction. The reach-avoid-stabilize (RAS) problem has therefore gained a lot of focus: find the set of initial states (the RAS set), such that the trajectory can reach the target, and stabilize to some point of interest (POI) while avoiding obstacles. Solving RAS problems using HJR usually requires defining a new value function, whose zero sub-level set is the RAS set. The existing methods do not consider the problem when there are a series of targets to reach and/or obstacles to avoid. We propose a method that uses the idea of admissible control sets; we guarantee that the system will reach each target while avoiding obstacles as prescribed by the given time series. Moreover, we guarantee that the trajectory ultimately stabilizes to the POI. The proposed method provides an under-approximation of the RAS set, guaranteeing safety. Numerical examples are provided to validate the theory.
comment: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control
``Sim2real gap", in which the system learned in simulations is not the exact representation of the real system, can lead to loss of stability and performance when controllers learned using data from the simulated system are used on the real system. In this work, we address this challenge in the linear quadratic regulator (LQR) setting. Specifically, we consider an LQR problem for a system with unknown system matrices. Along with the state-action pairs from the system to be controlled, a trajectory of length $S$ of state-action pairs from a different unknown system is available. Our proposed algorithm is constructed upon Thompson sampling and utilizes the mean as well as the uncertainty of the dynamics of the system from which the trajectory of length $S$ is obtained. We establish that the algorithm achieves $\tilde{\mathcal{O}}({f(S,M_{\delta})\sqrt{T/S}})$ Bayes regret after $T$ time steps, where $M_{\delta}$ characterizes the \emph{dissimilarity} between the two systems and $f(S,M_{\delta})$ is a function of $S$ and $M_{\delta}$. When $M_{\delta}$ is sufficiently small, the proposed algorithm achieves $\tilde{\mathcal{O}}({\sqrt{T/S}})$ Bayes regret and outperforms a naive strategy which does not utilize the available trajectory.
Hamilton's Rule for Enabling Altruism in Multi-Agent Systems
This paper explores the application of Hamilton's rule to altruistic decision-making in multi-agent systems. Inspired by biological altruism, we introduce a framework that evaluates when individual agents should incur costs to benefit their neighbors. By adapting Hamilton's rule, we define agent ``fitness" in terms of task productivity rather than genetic survival. We formalize altruistic decision-making through a graph-based model of multi-agent interactions and propose a solution using collaborative control Lyapunov functions. The approach ensures that altruistic behaviors contribute to the collective goal-reaching efficiency of the system. We illustrate this framework on a multi-agent way-point navigation problem, where we show through simulation how agent importance levels influence altruistic decision-making, leading to improved coordination in navigation tasks.
Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses
State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to static decoder boundaries. Goal: We introduce the Reviewer, a 3D visual interface projecting EMG signals directly into the decoder's classification space, providing intuitive, real-time insight into PR algorithm behavior. This structured feedback reduces cognitive load and fosters mutual, data-driven adaptation between user-generated EMG patterns and decoder boundaries. Methods: A 10-session study with 12 able-bodied participants compared PR performance after motor-based training and updating using the Reviewer versus conventional virtual arm visualization. Performance was assessed using a Fitts law task that involved the aperture of the cursor and the control of orientation. Results: Participants trained with the Reviewer achieved higher completion rates, reduced overshoot, and improved path efficiency and throughput compared to the standard visualization group. Significance: The Reviewer introduces decoder-informed motor training, facilitating immediate and consistent PR-based myoelectric control improvements. By iteratively refining control through real-time feedback, this approach reduces reliance on trial-and-error recalibration, enabling a more adaptive, self-correcting training framework. Conclusion: The 3D visual feedback significantly improves PR control in novice operators through structured training, enabling feedback-driven adaptation and reducing reliance on extensive heuristic adjustments.
Measuring Flexibility through Reduction Potential
While electric vehicles (EVs) often exhibit substantial flexibility, harnessing this flexibility requires precise characterization of its timing and magnitude. This paper introduces the reduction potential matrix, a novel approach to EV load flexibility modeling which is both straightforward to calculate and intuitive to interpret. This paper demonstrates the approach by quantifying flexibility for two distinct commercial vehicle groups--freight vehicles and transit buses--using simulated charging data from Virginia. While both groups are found to have substantial flexibility, its properties vary across the groups. Naturally, this variability manifests in differences in each group's role as a grid resource. The paper concludes with a discussion on how system planners, fleet operators, and other stakeholders can use the matrix to assess and leverage EV flexibility.
comment: 5 pages, 4 figures
On Signed Network Coordination Games
We study binary-action pairwise-separable network games that encompass both coordinating and anti-coordinating behaviors. Our model is grounded in an underlying directed signed graph, where each link is associated with a weight that describes the strenght and nature of the interaction. The utility for each agent is an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent 'cohesive' subset of players, who are either connected exclusively by positive weights, or forms a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Given the properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by a consensus or, respectively, a polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the supermodular properties of coordination games and on a novel use of the concept of graph cohesiveness.
comment: 14 pages, 8 figures
Risk-Aware Safe Reinforcement Learning for Control of Stochastic Linear Systems
This paper presents a risk-aware safe reinforcement learning (RL) control design for stochastic discrete-time linear systems. Rather than using a safety certifier to myopically intervene with the RL controller, a risk-informed safe controller is also learned besides the RL controller, and the RL and safe controllers are combined together. Several advantages come along with this approach: 1) High-confidence safety can be certified without relying on a high-fidelity system model and using limited data available, 2) Myopic interventions and convergence to an undesired equilibrium can be avoided by deciding on the contribution of two stabilizing controllers, and 3) highly efficient and computationally tractable solutions can be provided by optimizing over a scalar decision variable and linear programming polyhedral sets. To learn safe controllers with a large invariant set, piecewise affine controllers are learned instead of linear controllers. To this end, the closed-loop system is first represented using collected data, a decision variable, and noise. The effect of the decision variable on the variance of the safe violation of the closed-loop system is formalized. The decision variable is then designed such that the probability of safety violation for the learned closed-loop system is minimized. It is shown that this control-oriented approach reduces the data requirements and can also reduce the variance of safety violations. Finally, to integrate the safe and RL controllers, a new data-driven interpolation technique is introduced. This method aims to maintain the RL agent's optimal implementation while ensuring its safety within environments characterized by noise. The study concludes with a simulation example that serves to validate the theoretical results.
comment: Submitted to Asian Journal of Control
Communication-Efficient Distributed Online Nonconvex Optimization with Time-Varying Constraints
This paper considers distributed online nonconvex optimization with time-varying inequality constraints over a network of agents, where the nonconvex local loss and convex local constraint functions can vary arbitrarily across iterations, and the information of them is privately revealed to each agent at each iteration. For a uniformly jointly strongly connected time-varying directed graph, we propose two distributed bandit online primal--dual algorithm with compressed communication to efficiently utilize communication resources in the one-point and two-point bandit feedback settings, respectively. In nonconvex optimization, finding a globally optimal decision is often NP-hard. As a result, the standard regret metric used in online convex optimization becomes inapplicable. To measure the performance of the proposed algorithms, we use a network regret metric grounded in the first-order optimality condition associated with the variational inequality. We show that the compressed algorithms establish sublinear network regret and cumulative constraint violation bounds. Finally, a simulation example is presented to validate the theoretical results.
comment: 56 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2503.22410
Interval reduced-order switched positive observers for uncertain switched positive linear systems
In this paper, existence conditions and a design procedure of reduced-order switched positive observers for continuous- and discrete-time switched positive linear systems with uncertainty are established. In the analyzed class, arbitrary switching is permitted, whereas the uncertainty expressed via matrix inequalities concerns both the initial state and system parameters. Positive lower and positive upper interval switched observers are obtained. The proposed observers are of (n - p) order, where n is the dimension of the state vector and p is the rank of the output matrix, i.e., p-dimensional measurement information. Moreover, as a special case, existence conditions and a design procedure of reduced-order positive observers for uncertain positive linear systems without switching are provided. The theoretical findings are illustrated by two numerical examples for continuous- and discrete-time systems.
comment: 11 pages, 4 figures
Guaranteed Rejection-free Sampling Method Using Past Behaviours for Motion Planning of Autonomous Systems
The paper presents a novel learning-based sampling strategy that guarantees rejection-free sampling of the free space under both biased and approximately uniform conditions, leveraging multivariate kernel densities. Historical data from a given autonomous system is leveraged to estimate a non-parametric probabilistic description of the domain, which also describes the free space where feasible solutions of the motion planning problem are likely to be found. The tuning parameters of the kernel density estimator, the bandwidth and the kernel, are used to alter the description of the free space so that no samples can fall outside the originally defined space.The proposed method is demonstrated in two real-life case studies: An autonomous surface vessel (2D) and an autonomous drone (3D). Two planning problems are solved, showing that the proposed approximately uniform sampling scheme is capable of guaranteeing rejection-free samples of the considered workspace. Furthermore, the effectiveness of the proposed method is statistically validated using Monte Carlo simulations.
comment: Accepted for publication in Robotics and Autonomous Systems
Hybrid Resolver Model Generalization for Fault Condition Modeling: A Promising Tool for Reliability Study
Resolvers, like all electromagnetic devices, are constantly under investigation, both operationally and structurally. In this regard, proposing a modeling methodology that can save significant time without compromising accuracy is a big honor. In this study, a generalized hybrid model is suggested that, in addition to the above benefits, has sufficient capability to ease reliability study in the field of resolvers, where a large number of faulty conditions must be investigated under different operating conditions, including changes in angular velocity, voltage, and frequency of excitation; all of which are highlighted in the context of fault coverage. This model also serves as a promising tool for generating large datasets, which is advantageous for fault diagnosis. A resolver with a non-uniform air gap is chosen as a case study to challenge the suggested model, particularly in relation to eccentricity faults. We generalize the suggested model to account for the most common faulty conditions of resolvers: in-turn short circuits in signal and excitation windings, as well as static and dynamic eccentricity faults. The close agreement between the results of the suggested model and those from Time-Stepping Finite Element Analysis (TS-FEA), along with significant time savings in both healthy and faulty conditions, highlights the generality and proficiency of the suggested model. Finally, the case study is prototyped, and we verify the accuracy of the suggested model experimentally.
Kernel-based error bounds of bilinear Koopman surrogate models for nonlinear data-driven control
We derive novel deterministic bounds on the approximation error of data-based bilinear surrogate models for unknown nonlinear systems. The surrogate models are constructed using kernel-based extended dynamic mode decomposition to approximate the Koopman operator in a reproducing kernel Hilbert space. Unlike previous methods that require restrictive assumptions on the invariance of the dictionary, our approach leverages kernel-based dictionaries that allow us to control the projection error via pointwise error bounds, overcoming a significant limitation of existing theoretical guarantees. The derived state- and input-dependent error bounds allow for direct integration into Koopman-based robust controller designs with closed-loop guarantees for the unknown nonlinear system. Numerical examples illustrate the effectiveness of the proposed framework.
On Using Curved Mirrors to Decrease Shadowing in VLC
Visible light communication (VLC) complements radio frequency in indoor environments with large wireless data traffic. However, VLC is hindered by dramatic path losses when an opaque object is interposed between the transmitter and the receiver. Prior works propose the use of plane mirrors as optical reconfigurable intelligent surfaces (ORISs) to enhance communications through non-line-of-sight links. Plane mirrors rely on their orientation to forward the light to the target user location, which is challenging to implement in practice. This paper studies the potential of curved mirrors as static reflective surfaces to provide a broadening specular reflection that increases the signal coverage in mirror-assisted VLC scenarios. We study the behavior of paraboloid and semi-spherical mirrors and derive the irradiance equations. We provide extensive numerical and analytical results and show that curved mirrors, when developed with proper dimensions, may reduce the shadowing probability to zero, while static plane mirrors of the same size have shadowing probabilities larger than 65%. Furthermore, the signal-to-noise ratio offered by curved mirrors may suffice to provide connectivity to users deployed in the room even when a line-of-sight link blockage occurs.
comment: Accepted to be published in IEEE Globecom 2024
Soft Arm-Motor Thrust Characterization for a Pneumatically Actuated Soft Morphing Quadrotor
In this work, an experimental characterization of the configuration space of a soft, pneumatically actuated morphing quadrotor is presented, with a focus on precise thrust characterization of its flexible arms, considering the effect of downwash. Unlike traditional quadrotors, the soft drone has pneumatically actuated arms, introducing complex, nonlinear interactions between motor thrust and arm deformation, which make precise control challenging. The silicone arms are actuated using differential pressure to achieve flexibility and thus have a variable workspace compared to their fixed counter-parts. The deflection of the soft arms during compression and expansion is controlled throughout the flight. However, in real time, the downwash from the motor attached at the tip of the soft arm generates a significant and random disturbance on the arm. This disturbance affects both the desired deflection of the arm and the overall stability of the system. To address this factor, an experimental characterization of the effect of downwash on the deflection angle of the arm is conducted.
comment: This extended abstract was accepted for RoboSoft Conference, 2025 but later withdrawn
A Spectrum-based Filter Design for Periodic Control of Systems with Time Delay
A fully analytical controller design is proposed to tackle a periodic control problem for stable linear systems with an input delay. Applying the internal model control scheme, the controller design reduces to designing a filter, which is done through the placement of poles and zeros. The zeros are placed to compensate for the harmonics and to achieve the desired degree of properness for the filter. For placing the poles, a quasi-optimal procedure is proposed utilizing the standard LQR method. Given the high-dimensionality of the filter due to targeting a large number of harmonics, the design, as well as controller implementation, is performed over a state-space representation. A thorough experimental case study is included to demonstrate both the practical feasibility and effectiveness of the proposed control design. The experimental validation is performed on a physical system, the goal of which is to reject periodic vibrations acting on a mass-spring-damper setup where the sensor and the actuator are non-collocated.
comment: 25 pages, 10 figures, accepted 15 January 2025, Available online 22 January 2025, Version of Record 28 January 2025
VIMPPI: Enhancing Model Predictive Path Integral Control with Variational Integration for Underactuated Systems
This paper presents VIMPPI, a novel control approach for underactuated double pendulum systems developed for the AI Olympics competition. We enhance the Model Predictive Path Integral framework by incorporating variational integration techniques, enabling longer planning horizons without additional computational cost. Operating at 500-700 Hz with control interpolation and disturbance detection mechanisms, VIMPPI substantially outperforms both baseline methods and alternative MPPI implementations
Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans? ICRA2025
Designing planners and controllers for contact-rich manipulation is extremely challenging as contact violates the smoothness conditions that many gradient-based controller synthesis tools assume. Contact smoothing approximates a non-smooth system with a smooth one, allowing one to use these synthesis tools more effectively. However, applying classical control synthesis methods to smoothed contact dynamics remains relatively under-explored. This paper analyzes the efficacy of linear controller synthesis using differential simulators based on contact smoothing. We introduce natural baselines for leveraging contact smoothing to compute (a) open-loop plans robust to uncertain conditions and/or dynamics, and (b) feedback gains to stabilize around open-loop plans. Using robotic bimanual whole-body manipulation as a testbed, we perform extensive empirical experiments on over 300 trajectories and analyze why LQR seems insufficient for stabilizing contact-rich plans. The video summarizing this paper and hardware experiments is found here: https://youtu.be/HLaKi6qbwQg?si=_zCAmBBD6rGSitm9.
comment: ICRA2025
Triple-identity Authentication: The Future of Secure Access
In a typical authentication process, the local system verifies the user's identity using a stored hash value generated by a cross-system hash algorithm. This article shifts the research focus from traditional password encryption to the establishment of gatekeeping mechanisms for effective interactions between a system and the outside world. Here, we propose a triple-identity authentication system to achieve this goal. Specifically, this local system opens the inner structure of its hash algorithm to all user credentials, including the login name, login password, and authentication password. When a login credential is entered, the local system hashes it and then creates a unique identifier using intermediate hash elements randomly selected from the open algorithm. Importantly, this locally generated unique identifier (rather than the stored hash produced by the open algorithm) is utilized to verify the user's combined identity, which is generated by combining the entered credential with the International Mobile Equipment Identity and the International Mobile Subscriber Identity. The verification process is implemented at each interaction point: the login name field, the login password field, and the server's authentication point. Thus, within the context of this triple-identity authentication system, we establish a robust gatekeeping mechanism for system interactions, ultimately providing a level of security that is equivalent to multi-factor authentication.
comment: 10 pages, 2 figures,
Correct-by-construction requirement decomposition
In systems engineering, accurately decomposing requirements is crucial for creating well-defined and manageable system components, particularly in safety-critical domains. Despite the critical need, rigorous, top-down methodologies for effectively breaking down complex requirements into precise, actionable sub-requirements are scarce, especially compared to the wealth of bottom-up verification techniques. Addressing this gap, we introduce a formal decomposition for contract-based design that guarantees the correctness of decomposed requirements if specific conditions are met. Our (semi-)automated methodology augments contract-based design with reachability analysis and constraint programming to systematically identify, verify, and validate sub-requirements representable by continuous bounded sets -- continuous relations between real-valued inputs and outputs. We demonstrate the efficacy and practicality of a correct-by-construction approach through a comprehensive case study on a cruise control system, highlighting how our methodology improves the interpretability, tractability, and verifiability of system requirements.
Robotics
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
Mimicry is a fundamental learning mechanism in humans, enabling individuals to learn new tasks by observing and imitating experts. However, applying this ability to robots presents significant challenges due to the inherent differences between human and robot embodiments in both their visual appearance and physical capabilities. While previous methods bridge this gap using cross-embodiment datasets with shared scenes and tasks, collecting such aligned data between humans and robots at scale is not trivial. In this paper, we propose UniSkill, a novel framework that learns embodiment-agnostic skill representations from large-scale cross-embodiment video data without any labels, enabling skills extracted from human video prompts to effectively transfer to robot policies trained only on robot data. Our experiments in both simulation and real-world environments show that our cross-embodiment skills successfully guide robots in selecting appropriate actions, even with unseen video prompts. The project website can be found at: https://kimhanjung.github.io/UniSkill.
comment: Project Page: https://kimhanjung.github.io/UniSkill/
Optimal Trajectory Planning with Collision Avoidance for Autonomous Vehicle Maneuvering
To perform autonomous driving maneuvers, such as parallel or perpendicular parking, a vehicle requires continual speed and steering adjustments to follow a generated path. In consequence, the path's quality is a limiting factor of the vehicle maneuver's performance. While most path planning approaches include finding a collision-free route, optimal trajectory planning involves solving the best transition from initial to final states, minimizing the action over all paths permitted by a kinematic model. Here we propose a novel method based on sequential convex optimization, which permits flexible and efficient optimal trajectory generation. The objective is to achieve the fastest time, shortest distance, and fewest number of path segments to satisfy motion requirements, while avoiding sensor blind-spots. In our approach, vehicle kinematics are represented by a discretized Dubins model. To avoid collisions, each waypoint is constrained by linear inequalities representing closest distance of obstacles to a polygon specifying the vehicle's extent. To promote smooth and valid trajectories, the solved kinematic state and control variables are constrained and regularized by penalty terms in the model's cost function, which enforces physical restrictions including limits for steering angle, acceleration and speed. In this paper, we analyze trajectories obtained for several parking scenarios. Results demonstrate efficient and collision-free motion generated by the proposed technique.
comment: SIAM Conference on Control and Its Applications, July 28-30th, 2025, Montreal, Canada
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in diverse real-world environments. The key ingredient of NavDP's network is the combination of diffusion-based trajectory generation and a critic function for trajectory selection, which are conditioned on only local observation tokens encoded from a shared policy transformer. Given the privileged information of the global environment in simulation, we scale up the demonstrations of good quality to train the diffusion policy and formulate the critic value function targets with contrastive negative samples. Our demonstration generation approach achieves about 2,500 trajectories/GPU per day, 20$\times$ more efficient than real-world data collection, and results in a large-scale navigation dataset with 363.2km trajectories across 1244 scenes. Trained with this simulation dataset, NavDP achieves state-of-the-art performance and consistently outstanding generalization capability on quadruped, wheeled, and humanoid robots in diverse indoor and outdoor environments. In addition, we present a preliminary attempt at using Gaussian Splatting to make in-domain real-to-sim fine-tuning to further bridge the sim-to-real gap. Experiments show that adding such real-to-sim data can improve the success rate by 30\% without hurting its generalization capability.
comment: 14 pages, 6 figures
A Social Robot with Inner Speech for Dietary Guidance
We explore the use of inner speech as a mechanism to enhance transparency and trust in social robots for dietary advice. In humans, inner speech structures thought processes and decision-making; in robotics, it improves explainability by making reasoning explicit. This is crucial in healthcare scenarios, where trust in robotic assistants depends on both accurate recommendations and human-like dialogue, which make interactions more natural and engaging. Building on this, we developed a social robot that provides dietary advice, and we provided the architecture with inner speech capabilities to validate user input, refine reasoning, and generate clear justifications. The system integrates large language models for natural language understanding and a knowledge graph for structured dietary information. By making decisions more transparent, our approach strengthens trust and improves human-robot interaction in healthcare. We validated this by measuring the computational efficiency of our architecture and conducting a small user study, which assessed the reliability of inner speech in explaining the robot's behavior.
A Comparative Study of Human Activity Recognition: Motion, Tactile, and multi-modal Approaches
Human activity recognition (HAR) is essential for effective Human-Robot Collaboration (HRC), enabling robots to interpret and respond to human actions. This study evaluates the ability of a vision-based tactile sensor to classify 15 activities, comparing its performance to an IMU-based data glove. Additionally, we propose a multi-modal framework combining tactile and motion data to leverage their complementary strengths. We examined three approaches: motion-based classification (MBC) using IMU data, tactile-based classification (TBC) with single or dual video streams, and multi-modal classification (MMC) integrating both. Offline validation on segmented datasets assessed each configuration's accuracy under controlled conditions, while online validation on continuous action sequences tested online performance. Results showed the multi-modal approach consistently outperformed single-modality methods, highlighting the potential of integrating tactile and motion sensing to enhance HAR systems for collaborative robotics.
DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting
This work presents DLO-Splatting, an algorithm for estimating the 3D shape of Deformable Linear Objects (DLOs) from multi-view RGB images and gripper state information through prediction-update filtering. The DLO-Splatting algorithm uses a position-based dynamics model with shape smoothness and rigidity dampening corrections to predict the object shape. Optimization with a 3D Gaussian Splatting-based rendering loss iteratively renders and refines the prediction to align it with the visual observations in the update step. Initial experiments demonstrate promising results in a knot tying scenario, which is challenging for existing vision-only methods.
comment: 5 pages, 2 figures, presented at the 2025 5th Workshop: Reflections on Representations and Manipulating Deformable Objects at the IEEE International Conference on Robotics and Automation. RMDO workshop (https://deformable-workshop.github.io/icra2025/)
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Visuomotor policies trained on human expert demonstrations have recently shown strong performance across a wide range of robotic manipulation tasks. However, these policies remain highly sensitive to domain shifts stemming from background or robot embodiment changes, which limits their generalization capabilities. In this paper, we present ARRO, a novel calibration-free visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene without requiring additional training. By filtering visual distractors and overlaying virtual guides during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on several tabletop manipulation tasks in both simulation and real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo and OpenVLA. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: augmented-reality-for-robots.github.io
Beyond Predefined Actions: Integrating Behavior Trees and Dynamic Movement Primitives for Robot Learning from Demonstration
Interpretable policy representations like Behavior Trees (BTs) and Dynamic Motion Primitives (DMPs) enable robot skill transfer from human demonstrations, but each faces limitations: BTs require expert-crafted low-level actions, while DMPs lack high-level task logic. We address these limitations by integrating DMP controllers into a BT framework, jointly learning the BT structure and DMP actions from single demonstrations, thereby removing the need for predefined actions. Additionally, by combining BT decision logic with DMP motion generation, our method enhances policy interpretability, modularity, and adaptability for autonomous systems. Our approach readily affords both learning to replicate low-level motions and combining partial demonstrations into a coherent and easy-to-modify overall policy.
comment: 14 pages, 6 figures, accepted (not yet published) at IAS19 2025 conference
Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations
We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces. Based on a popular maximum entropy criteria, our approach iteratively finds a weight improvement step and proposes a method to find an appropriate step size that ensures learned cost function features remain similar to the demonstrated trajectory features. In contrast to similar approaches, our algorithm can individually tune the effectiveness of each observation for the partition function and does not need a large sample set, enabling faster learning. We generate sample trajectories by solving an optimal control problem instead of random sampling, leading to more informative trajectories. The performance of our method is compared to two state of the art algorithms to demonstrate its benefits in several simulated environments.
MC-Swarm: Minimal-Communication Multi-Agent Trajectory Planning and Deadlock Resolution for Quadrotor Swarm
For effective multi-agent trajectory planning, it is important to consider lightweight communication and its potential asynchrony. This paper presents a distributed trajectory planning algorithm for a quadrotor swarm that operates asynchronously and requires no communication except during the initial planning phase. Moreover, our algorithm guarantees no deadlock under asynchronous updates and absence of communication during flight. To effectively ensure these points, we build two main modules: coordination state updater and trajectory optimizer. The coordination state updater computes waypoints for each agent toward its goal and performs subgoal optimization while considering deadlocks, as well as safety constraints with respect to neighbor agents and obstacles. Then, the trajectory optimizer generates a trajectory that ensures collision avoidance even with the asynchronous planning updates of neighboring agents. We provide a theoretical guarantee of collision avoidance with deadlock resolution and evaluate the effectiveness of our method in complex simulation environments, including random forests and narrow-gap mazes. Additionally, to reduce the total mission time, we design a faster coordination state update using lightweight communication. Lastly, our approach is validated through extensive simulations and real-world experiments with cluttered environment scenarios.
comment: 13 pages, 11 figures
MESSI: A Multi-Elevation Semantic Segmentation Image Dataset of an Urban Environment
This paper presents a Multi-Elevation Semantic Segmentation Image (MESSI) dataset comprising 2525 images taken by a drone flying over dense urban environments. MESSI is unique in two main features. First, it contains images from various altitudes, allowing us to investigate the effect of depth on semantic segmentation. Second, it includes images taken from several different urban regions (at different altitudes). This is important since the variety covers the visual richness captured by a drone's 3D flight, performing horizontal and vertical maneuvers. MESSI contains images annotated with location, orientation, and the camera's intrinsic parameters and can be used to train a deep neural network for semantic segmentation or other applications of interest (e.g., localization, navigation, and tracking). This paper describes the dataset and provides annotation details. It also explains how semantic segmentation was performed using several neural network models and shows several relevant statistics. MESSI will be published in the public domain to serve as an evaluation benchmark for semantic segmentation using images captured by a drone or similar vehicle flying over a dense urban environment.
End-to-End Multi-Task Policy Learning from NMPC for Quadruped Locomotion
Quadruped robots excel in traversing complex, unstructured environments where wheeled robots often fail. However, enabling efficient and adaptable locomotion remains challenging due to the quadrupeds' nonlinear dynamics, high degrees of freedom, and the computational demands of real-time control. Optimization-based controllers, such as Nonlinear Model Predictive Control (NMPC), have shown strong performance, but their reliance on accurate state estimation and high computational overhead makes deployment in real-world settings challenging. In this work, we present a Multi-Task Learning (MTL) framework in which expert NMPC demonstrations are used to train a single neural network to predict actions for multiple locomotion behaviors directly from raw proprioceptive sensor inputs. We evaluate our approach extensively on the quadruped robot Go1, both in simulation and on real hardware, demonstrating that it accurately reproduces expert behavior, allows smooth gait switching, and simplifies the control pipeline for real-time deployment. Our MTL architecture enables learning diverse gaits within a unified policy, achieving high $R^{2}$ scores for predicted joint targets across all tasks.
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Achieving generalization in robotic manipulation remains a critical challenge, particularly for unseen scenarios and novel tasks. Current Vision-Language-Action (VLA) models, while building on top of general Vision-Language Models (VLMs), still fall short of achieving robust zero-shot performance due to the scarcity and heterogeneity prevalent in embodied datasets. To address these limitations, we propose FSD (From Seeing to Doing), a novel vision-language model that generates intermediate representations through spatial relationship reasoning, providing fine-grained guidance for robotic manipulation. Our approach combines a hierarchical data pipeline for training with a self-consistency mechanism that aligns spatial coordinates with visual signals. Through extensive experiments, we comprehensively validated FSD's capabilities in both "seeing" and "doing," achieving outstanding performance across 8 benchmarks for general spatial reasoning and embodied reference abilities, as well as on our proposed more challenging benchmark VABench. We also verified zero-shot capabilities in robot manipulation, demonstrating significant performance improvements over baseline methods in both SimplerEnv and real robot settings. Experimental results show that FSD achieves 54.1% success rate in SimplerEnv and 72% success rate across 8 real-world tasks, outperforming the strongest baseline by 30%.
comment: Early version
FOCI: Trajectory Optimization on Gaussian Splats
3D Gaussian Splatting (3DGS) has recently gained popularity as a faster alternative to Neural Radiance Fields (NeRFs) in 3D reconstruction and view synthesis methods. Leveraging the spatial information encoded in 3DGS, this work proposes FOCI (Field Overlap Collision Integral), an algorithm that is able to optimize trajectories directly on the Gaussians themselves. FOCI leverages a novel and interpretable collision formulation for 3DGS using the notion of the overlap integral between Gaussians. Contrary to other approaches, which represent the robot with conservative bounding boxes that underestimate the traversability of the environment, we propose to represent the environment and the robot as Gaussian Splats. This not only has desirable computational properties, but also allows for orientation-aware planning, allowing the robot to pass through very tight and narrow spaces. We extensively test our algorithm in both synthetic and real Gaussian Splats, showcasing that collision-free trajectories for the ANYmal legged robot that can be computed in a few seconds, even with hundreds of thousands of Gaussians making up the environment. The project page and code are available at https://rffr.leggedrobotics.com/works/foci/
comment: 7 pages, 8 figures, Mario Gomez Andreu and Maximum Wilder-Smith contributed equally
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM
PDDL-based symbolic task planning remains pivotal for robot autonomy yet struggles with dynamic human-robot collaboration due to scalability, re-planning demands, and delayed plan availability. Although a few neurosymbolic frameworks have previously leveraged LLMs such as GPT-3 to address these challenges, reliance on closed-source, remote models with limited context introduced critical constraints: third-party dependency, inconsistent response times, restricted plan length and complexity, and multi-domain scalability issues. We present Gideon, a novel framework that enables the transition to modern, smaller, local LLMs with extended context length. Gideon integrates a novel problem generator to systematically generate large-scale datasets of realistic domain-problem-plan tuples for any domain, and adapts neurosymbolic planning for local LLMs, enabling on-device execution and extended context for multi-domain support. Preliminary experiments in single-domain scenarios performed on Qwen-2.5 1.5B and trained on 8k-32k samples, demonstrate a valid plan percentage of 66.1% (32k model) and show that the figure can be further scaled through additional data. Multi-domain tests on 16k samples yield an even higher 70.6% planning validity rate, proving extensibility across domains and signaling that data variety can have a positive effect on learning efficiency. Although long-horizon planning and reduced model size make Gideon training much less efficient than baseline models based on larger LLMs, the results are still significant considering that the trained model is about 120x smaller than baseline and that significant advantages can be achieved in inference efficiency, scalability, and multi-domain adaptability, all critical factors in human-robot collaboration. Training inefficiency can be mitigated by Gideon's streamlined data generation pipeline.
comment: 19 pages, 3 figures, 4 tables, accepted at IAS 2025
Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting
This paper presents a comprehensive sim-to-real pipeline for autonomous strawberry picking from dense clusters using a Franka Panda robot. Our approach leverages a custom Mujoco simulation environment that integrates domain randomization techniques. In this environment, a deep reinforcement learning agent is trained using the dormant ratio minimization algorithm. The proposed pipeline bridges low-level control with high-level perception and decision making, demonstrating promising performance in both simulation and in a real laboratory environment, laying the groundwork for successful transfer to real-world autonomous fruit harvesting.
Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges
Causal understanding is important in many disciplines of science and engineering, where we seek to understand how different factors in the system causally affect an experiment or situation and pave a pathway towards creating effective or optimising existing models. Examples of use cases are autonomous exploration and modelling of unknown environments or assessing key variables in optimising large complex systems. In this paper, we analyse a Reinforcement Learning approach called Causal Curiosity, which aims to estimate as accurately and efficiently as possible, without directly measuring them, the value of factors that causally determine the dynamics of a system. Whilst the idea presents a pathway forward, measurement accuracy is the foundation of methodology effectiveness. Focusing on the current causal curiosity's robotic manipulator, we present for the first time a measurement accuracy analysis of the future potentials and current limitations of this technique and an analysis of its sensitivity and confounding factor disentanglement capability - crucial for causal analysis. As a result of our work, we promote proposals for an improved and efficient design of Causal Curiosity methods to be applied to real-world complex scenarios.
comment: 24 pages, 10 figures, 9 tables
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Visual planning, by offering a sequence of intermediate visual subgoals to a goal-conditioned low-level policy, achieves promising performance on long-horizon manipulation tasks. To obtain the subgoals, existing methods typically resort to video generation models but suffer from model hallucination and computational cost. We present Vis2Plan, an efficient, explainable and white-box visual planning framework powered by symbolic guidance. From raw, unlabeled play data, Vis2Plan harnesses vision foundation models to automatically extract a compact set of task symbols, which allows building a high-level symbolic transition graph for multi-goal, multi-stage planning. At test time, given a desired task goal, our planner conducts planning at the symbolic level and assembles a sequence of physically consistent intermediate sub-goal images grounded by the underlying symbolic representation. Our Vis2Plan outperforms strong diffusion video generation-based visual planners by delivering 53\% higher aggregate success rate in real robot settings while generating visual plans 35$\times$ faster. The results indicate that Vis2Plan is able to generate physically consistent image goals while offering fully inspectable reasoning steps.
HMR-ODTA: Online Diverse Task Allocation for a Team of Heterogeneous Mobile Robots
Coordinating time-sensitive deliveries in environments like hospitals poses a complex challenge, particularly when managing multiple online pickup and delivery requests within strict time windows using a team of heterogeneous robots. Traditional approaches fail to address dynamic rescheduling or diverse service requirements, typically restricting robots to single-task types. This paper tackles the Multi-Pickup and Delivery Problem with Time Windows (MPDPTW), where autonomous mobile robots are capable of handling varied service requests. The objective is to minimize late delivery penalties while maximizing task completion rates. To achieve this, we propose a novel framework leveraging a heterogeneous robot team and an efficient dynamic scheduling algorithm that supports dynamic task rescheduling. Users submit requests with specific time constraints, and our decentralized algorithm, Heterogeneous Mobile Robots Online Diverse Task Allocation (HMR-ODTA), optimizes task assignments to ensure timely service while addressing delays or task rejections. Extensive simulations validate the algorithm's effectiveness. For smaller task sets (40-160 tasks), penalties were reduced by nearly 63%, while for larger sets (160-280 tasks), penalties decreased by approximately 50%. These results highlight the algorithm's effectiveness in improving task scheduling and coordination in multi-robot systems, offering a robust solution for enhancing delivery performance in structured, time-critical environments.
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
Grasping unknown objects in unstructured environments remains a fundamental challenge in robotics, requiring both semantic understanding and spatial reasoning. Existing methods often rely on dense training datasets or explicit geometric modeling, limiting their scalability to real-world tasks. Recent advances in Large Multimodal Models (LMMs) offer new possibilities for integrating vision and language understanding, but their application to autonomous robotic grasping remains largely unexplored. We present ORACLE-Grasp, a zero-shot framework that leverages LMMs as semantic oracles to guide grasp selection without requiring additional training or human input. The system formulates grasp prediction as a structured, iterative decision process, using dual-prompt tool calling to first extract high-level object context and then select task-relevant grasp regions. By discretizing the image space and reasoning over candidate areas, ORACLE-Grasp mitigates the spatial imprecision common in LMMs and produces human-like, task-driven grasp suggestions. Early stopping and depth-based refinement steps further enhance efficiency and physical grasp reliability. Experiments demonstrate that the predicted grasps achieve low positional and orientation errors relative to human-annotated ground truth and lead to high success rates in real-world pick up tasks. These results highlight the potential of combining language-driven reasoning with lightweight vision techniques to enable robust, autonomous grasping without task-specific datasets or retraining.
MDF: Multi-Modal Data Fusion with CNN-Based Object Detection for Enhanced Indoor Localization Using LiDAR-SLAM
Indoor localization faces persistent challenges in achieving high accuracy, particularly in GPS-deprived environments. This study unveils a cutting-edge handheld indoor localization system that integrates 2D LiDAR and IMU sensors, delivering enhanced high-velocity precision mapping, computational efficiency, and real-time adaptability. Unlike 3D LiDAR systems, it excels with rapid processing, low-cost scalability, and robust performance, setting new standards for emergency response, autonomous navigation, and industrial automation. Enhanced with a CNN-driven object detection framework and optimized through Cartographer SLAM (simultaneous localization and mapping ) in ROS, the system significantly reduces Absolute Trajectory Error (ATE) by 21.03%, achieving exceptional precision compared to state-of-the-art approaches like SC-ALOAM, with a mean x-position error of -0.884 meters (1.976 meters). The integration of CNN-based object detection ensures robustness in mapping and localization, even in cluttered or dynamic environments, outperforming existing methods by 26.09%. These advancements establish the system as a reliable, scalable solution for high-precision localization in challenging indoor scenarios
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning IROS 2025
Unmanned Aerial Vehicle (UAV) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid-based representations, real-world UAV operations require power-efficient continuous motion planning. We formulate the UAV CPP problem in a continuous environment, minimizing power consumption while ensuring complete coverage. Our approach models the environment with variable-size axis-aligned rectangles and UAV motion with curvature-constrained B\'ezier curves. We train a reinforcement learning agent using an action-mapping-based Soft Actor-Critic (AM-SAC) algorithm employing a self-adaptive curriculum. Experiments on both procedurally generated and hand-crafted scenarios demonstrate the effectiveness of our method in learning energy-efficient coverage strategies.
comment: Submitted to IROS 2025
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Recent studies have shown the great potential of diffusion models in improving reinforcement learning (RL) by modeling complex policies, expressing a high degree of multi-modality, and efficiently handling high-dimensional continuous control tasks. However, there is currently limited research on how to optimize diffusion-based polices (e.g., Diffusion Policy) fast and stably. In this paper, we propose an Adam-based Diffusion Policy Optimization (ADPO), a fast algorithmic framework containing best practices for fine-tuning diffusion-based polices in robotic control tasks using the adaptive gradient descent method in RL. Adaptive gradient method is less studied in training RL, let alone diffusion-based policies. We confirm that ADPO outperforms other diffusion-based RL methods in terms of overall effectiveness for fine-tuning on standard robotic tasks. Concretely, we conduct extensive experiments on standard robotic control tasks to test ADPO, where, particularly, six popular diffusion-based RL methods are provided as benchmark methods. Experimental results show that ADPO acquires better or comparable performance than the baseline methods. Finally, we systematically analyze the sensitivity of multiple hyperparameters in standard robotics tasks, providing guidance for subsequent practical applications. Our video demonstrations are released in https://github.com/Timeless-lab/ADPO.git.
MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos
Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To address this limitation, we propose Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos (MA-ROESL). MA-ROESL integrates a motion-aware frame selection method to implicitly enhance the quality of VLM-generated reward functions. It further employs a hybrid three-phase training pipeline that improves training efficiency via rapid reward optimization and derives the final policy through online fine-tuning. Experimental results demonstrate that MA-ROESL significantly enhances training efficiency while faithfully reproducing locomotion skills in both simulated and real-world settings, thereby underscoring its potential as a robust and scalable framework for efficient robot locomotion skill learning from video demonstrations.
A spherical amplitude-phase formulation for 3-D adaptive line-of-sight (ALOS) guidance with USGES stability guarantees
A recently proposed 3-D adaptive line-of-sight (ALOS) path-following algorithm addressed coupled motion dynamics of marine craft, aircraft, and uncrewed vehicles under environmental disturbances such as wind, waves, and ocean currents. Stability analysis established uniform semiglobal exponential stability (USGES) of the cross- and vertical-track errors using a body-velocity-based amplitude-phase representation of the North-East-Down (NED) kinematic differential equations. In this brief paper, we revisit the ALOS framework and introduce a novel spherical amplitude-phase representation. This formulation yields a more geometrically intuitive and physically observable description of the guidance errors and enables a significantly simplified stability proof. Unlike the previous model, which relied on a vertical crab angle derived from body-frame velocities, the new representation uses an alternative vertical crab angle and retains the USGES property. It also removes restrictive assumptions such as constant altitude/depth or zero horizontal crab angle, and remains valid for general 3-D maneuvers with nonzero roll, pitch, and flight-path angles.
comment: 8 pages, 1 figure
Fast Contact Detection via Fusion of Joint and Inertial Sensors for Parallel Robots in Human-Robot Collaboration
Fast contact detection is crucial for safe human-robot collaboration. Observers based on proprioceptive information can be used for contact detection but have first-order error dynamics, which results in delays. Sensor fusion based on inertial measurement units (IMUs) consisting of accelerometers and gyroscopes is advantageous for reducing delays. The acceleration estimation enables the direct calculation of external forces. For serial robots, the installation of multiple accelerometers and gyroscopes is required for dynamics modeling since the joint coordinates are the minimal coordinates. Alternatively, parallel robots (PRs) offer the potential to use only one IMU on the end-effector platform, which already presents the minimal coordinates of the PR. This work introduces a sensor-fusion method for contact detection using encoders and only one low-cost, consumer-grade IMU for a PR. The end-effector accelerations are estimated by an extended Kalman filter and incorporated into the dynamics to calculate external forces. In real-world experiments with a planar PR, we demonstrate that this approach reduces the detection duration by up to 50% compared to a momentum observer and enables the collision and clamping detection within 3-39ms.
comment: Preprint of a publication accepted for IEEE Robotics and Automation Letters
Automatic Curriculum Learning for Driving Scenarios: Towards Robust and Efficient Reinforcement Learning
This paper addresses the challenges of training end-to-end autonomous driving agents using Reinforcement Learning (RL). RL agents are typically trained in a fixed set of scenarios and nominal behavior of surrounding road users in simulations, limiting their generalization and real-life deployment. While domain randomization offers a potential solution by randomly sampling driving scenarios, it frequently results in inefficient training and sub-optimal policies due to the high variance among training scenarios. To address these limitations, we propose an automatic curriculum learning framework that dynamically generates driving scenarios with adaptive complexity based on the agent's evolving capabilities. Unlike manually designed curricula that introduce expert bias and lack scalability, our framework incorporates a ``teacher'' that automatically generates and mutates driving scenarios based on their learning potential -- an agent-centric metric derived from the agent's current policy -- eliminating the need for expert design. The framework enhances training efficiency by excluding scenarios the agent has mastered or finds too challenging. We evaluate our framework in a reinforcement learning setting where the agent learns a driving policy from camera images. Comparative results against baseline methods, including fixed scenario training and domain randomization, demonstrate that our approach leads to enhanced generalization, achieving higher success rates: +9\% in low traffic density, +21\% in high traffic density, and faster convergence with fewer training steps. Our findings highlight the potential of ACL in improving the robustness and efficiency of RL-based autonomous driving agents.
comment: Accepted in the 36th IEEE Intelligent Vehicles Symposium (IV 2025)
Training Strategies for Efficient Embodied Reasoning
Robot chain-of-thought reasoning (CoT) -- wherein a model predicts helpful intermediate representations before choosing actions -- provides an effective method for improving the generalization and performance of robot policies, especially vision-language-action models (VLAs). While such approaches have been shown to improve performance and generalization, they suffer from core limitations, like needing specialized robot reasoning data and slow inference speeds. To design new robot reasoning approaches that address these issues, a more complete characterization of why reasoning helps policy performance is critical. We hypothesize several mechanisms by which robot reasoning improves policies -- (1) better representation learning, (2) improved learning curricularization, and (3) increased expressivity -- then devise simple variants of robot CoT reasoning to isolate and test each one. We find that learning to generate reasonings does lead to better VLA representations, while attending to the reasonings aids in actually leveraging these features for improved action prediction. Our results provide us with a better understanding of why CoT reasoning helps VLAs, which we use to introduce two simple and lightweight alternative recipes for robot reasoning. Our proposed approaches achieve significant performance gains over non-reasoning policies, state-of-the-art results on the LIBERO-90 benchmark, and a 3x inference speedup compared to standard robot reasoning.
Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning ICLR 2025
Controlling high-dimensional nonlinear systems, such as those found in biological and robotic applications, is challenging due to large state and action spaces. While deep reinforcement learning has achieved a number of successes in these domains, it is computationally intensive and time consuming, and therefore not suitable for solving large collections of tasks that require significant manual tuning. In this work, we introduce Model Predictive Control with Morphology-aware Proportional Control (MPC^2), a hierarchical model-based learning algorithm for zero-shot and near-real-time control of high-dimensional complex dynamical systems. MPC^2 uses a sampling-based model predictive controller for target posture planning, and enables robust control for high-dimensional tasks by incorporating a morphology-aware proportional controller for actuator coordination. The algorithm enables motion control of a high-dimensional human musculoskeletal model in a variety of motion tasks, such as standing, walking on different terrains, and imitating sports activities. The reward function of MPC^2 can be tuned via black-box optimization, drastically reducing the need for human-intensive reward engineering.
comment: Accepted by ICLR 2025
SKiD-SLAM: Robust, Lightweight, and Distributed Multi-Robot LiDAR SLAM in Resource-Constrained Field Environments
Distributed LiDAR SLAM is crucial for achieving efficient robot autonomy and improving the scalability of mapping. However, two issues need to be considered when applying it in field environments: one is resource limitation, and the other is inter/intra-robot association. The resource limitation issue arises when the data size exceeds the processing capacity of the network or memory, especially when utilizing communication systems or onboard computers in the field. The inter/intra-robot association issue occurs due to the narrow convergence region of ICP under large viewpoint differences, triggering many false positive loops and ultimately resulting in an inconsistent global map for multi-robot systems. To tackle these problems, we propose a distributed LiDAR SLAM framework designed for versatile field applications, called SKiD-SLAM. Extending our previous work that solely focused on lightweight place recognition and fast and robust global registration, we present a multi-robot mapping framework that focuses on robust and lightweight inter-robot loop closure in distributed LiDAR SLAM. Through various environmental experiments, we demonstrate that our method is more robust and lightweight compared to other state-of-the-art distributed SLAM approaches, overcoming resource limitation and inter/intra-robot association issues. Also, we validated the field applicability of our approach through mapping experiments in real-world planetary emulation terrain and cave environments, which are in-house datasets. Our code will be available at https://sparolab.github.io/research/skid_slam/.
comment: 8 pages, 10 figures
Constrained Factor Graph Optimization for Robust Networked Pedestrian Inertial Navigation
This paper presents a novel constrained Factor Graph Optimization (FGO)-based approach for networked inertial navigation in pedestrian localization. To effectively mitigate the drift inherent in inertial navigation solutions, we incorporate kinematic constraints directly into the nonlinear optimization framework. Specifically, we utilize equality constraints, such as Zero-Velocity Updates (ZUPTs), and inequality constraints representing the maximum allowable distance between body-mounted Inertial Measurement Units (IMUs) based on human anatomical limitations. While equality constraints are straightforwardly integrated as error factors, inequality constraints cannot be explicitly represented in standard FGO formulations. To address this, we introduce a differentiable softmax-based penalty term in the FGO cost function to enforce inequality constraints smoothly and robustly. The proposed constrained FGO approach leverages temporal correlations across multiple epochs, resulting in optimal state trajectory estimates while consistently maintaining constraint satisfaction. Experimental results confirm that our method outperforms conventional Kalman filter approaches, demonstrating its effectiveness and robustness for pedestrian navigation.
comment: 6 pages, 5 figures. Accepted by 2025 IEEE/ION Position, Location and Navigation Symposium (PLANS)
Reinforcement Learning-based Fault-Tolerant Control for Quadrotor with Online Transformer Adaptation ICRA
Multirotors play a significant role in diverse field robotics applications but remain highly susceptible to actuator failures, leading to rapid instability and compromised mission reliability. While various fault-tolerant control (FTC) strategies using reinforcement learning (RL) have been widely explored, most previous approaches require prior knowledge of the multirotor model or struggle to adapt to new configurations. To address these limitations, we propose a novel hybrid RL-based FTC framework integrated with a transformer-based online adaptation module. Our framework leverages a transformer architecture to infer latent representations in real time, enabling adaptation to previously unseen system models without retraining. We evaluate our method in a PyBullet simulation under loss-of-effectiveness actuator faults, achieving a 95% success rate and a positional root mean square error (RMSE) of 0.129 m, outperforming existing adaptation methods with 86% success and an RMSE of 0.153 m. Further evaluations on quadrotors with varying configurations confirm the robustness of our framework across untrained dynamics. These results demonstrate the potential of our framework to enhance the adaptability and reliability of multirotors, enabling efficient fault management in dynamic and uncertain environments. Website is available at http://00dhkim.me/paper/rl-ftc
comment: Accpted at the 2025 IEEE International Conference on Robotics & Automation (ICRA) Workshop: Robots in the Wild
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles
Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs in complex marine environments. However, scaling these techniques to a fleet--essential for multi-target tracking or targets with rapid, unpredictable motion--presents significant computational challenges. Multi-Agent Reinforcement Learning (MARL) is notoriously sample-inefficient, and while high-fidelity simulators like Gazebo's LRAUV provide 100x faster-than-real-time single-robot simulations, they offer no significant speedup for multi-vehicle scenarios, making MARL training impractical. To address these limitations, we propose an iterative distillation method that transfers high-fidelity simulations into a simplified, GPU-accelerated environment while preserving high-level dynamics. This approach achieves up to a 30,000x speedup over Gazebo through parallelization, enabling efficient training via end-to-end GPU acceleration. Additionally, we introduce a novel Transformer-based architecture (TransfMAPPO) that learns multi-agent policies invariant to the number of agents and targets, significantly improving sample efficiency. Following large-scale curriculum learning conducted entirely on GPU, we perform extensive evaluations in Gazebo, demonstrating that our method maintains tracking errors below 5 meters over extended durations, even in the presence of multiple fast-moving targets. This work bridges the gap between large-scale MARL training and high-fidelity deployment, providing a scalable framework for autonomous fleet control in real-world sea missions.
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
HandCept: A Visual-Inertial Fusion Framework for Accurate Proprioception in Dexterous Hands
As robotics progresses toward general manipulation, dexterous hands are becoming increasingly critical. However, proprioception in dexterous hands remains a bottleneck due to limitations in volume and generality. In this work, we present HandCept, a novel visual-inertial proprioception framework designed to overcome the challenges of traditional joint angle estimation methods. HandCept addresses the difficulty of achieving accurate and robust joint angle estimation in dynamic environments where both visual and inertial measurements are prone to noise and drift. It leverages a zero-shot learning approach using a wrist-mounted RGB-D camera and 9-axis IMUs, fused in real time via a latency-free Extended Kalman Filter (EKF). Our results show that HandCept achieves joint angle estimation errors between $2^{\circ}$ and $4^{\circ}$ without observable drift, outperforming visual-only and inertial-only methods. Furthermore, we validate the stability and uniformity of the IMU system, demonstrating that a common base frame across IMUs simplifies system calibration. To support sim-to-real transfer, we also open-sourced our high-fidelity rendering pipeline, which is essential for training without real-world ground truth. This work offers a robust, generalizable solution for proprioception in dexterous hands, with significant implications for robotic manipulation and human-robot interaction.
comment: 8 pages, 7 figures, journal
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Recent advancements in integrating tactile sensing with vision-language models (VLMs) have demonstrated remarkable potential for robotic multimodal perception. However, existing tactile descriptions remain limited to superficial attributes like texture, neglecting critical contact states essential for robotic manipulation. To bridge this gap, we propose CLTP, an intuitive and effective language tactile pretraining framework that aligns tactile 3D point clouds with natural language in various contact scenarios, thus enabling contact-state-aware tactile language understanding for contact-rich manipulation tasks. We first collect a novel dataset of 50k+ tactile 3D point cloud-language pairs, where descriptions explicitly capture multidimensional contact states (e.g., contact location, shape, and force) from the tactile sensor's perspective. CLTP leverages a pre-aligned and frozen vision-language feature space to bridge holistic textual and tactile modalities. Experiments validate its superiority in three downstream tasks: zero-shot 3D classification, contact state classification, and tactile 3D large language model (LLM) interaction. To the best of our knowledge, this is the first study to align tactile and language representations from the contact state perspective for manipulation tasks, providing great potential for tactile-language-action model learning. Code and datasets are open-sourced at https://sites.google.com/view/cltp/.
comment: 16 pages
A Tightly Coupled IMU-Based Motion Capture Approach for Estimating Multibody Kinematics and Kinetics
Inertial Measurement Units (IMUs) enable portable, multibody motion capture (MoCap) in diverse environments beyond the laboratory, making them a practical choice for diagnosing mobility disorders and supporting rehabilitation in clinical or home settings. However, challenges associated with IMU measurements, including magnetic distortions and drift errors, complicate their broader use for MoCap. In this work, we propose a tightly coupled motion capture approach that directly integrates IMU measurements with multibody dynamic models via an Iterated Extended Kalman Filter (IEKF) to simultaneously estimate the system's kinematics and kinetics. By enforcing kinematic and kinetic properties and utilizing only accelerometer and gyroscope data, our method improves IMU-based state estimation accuracy. Our approach is designed to allow for incorporating additional sensor data, such as optical MoCap measurements and joint torque readings, to further enhance estimation accuracy. We validated our approach using highly accurate ground truth data from a 3 Degree of Freedom (DoF) pendulum and a 6 DoF Kuka robot. We demonstrate a maximum Root Mean Square Difference (RMSD) in the pendulum's computed joint angles of 3.75 degrees compared to optical MoCap Inverse Kinematics (IK), which serves as the gold standard in the absence of internal encoders. For the Kuka robot, we observe a maximum joint angle RMSD of 3.24 degrees compared to the Kuka's internal encoders, while the maximum joint angle RMSD of the optical MoCap IK compared to the encoders was 1.16 degrees. Additionally, we report a maximum joint torque RMSD of 2 Nm in the pendulum compared to optical MoCap Inverse Dynamics (ID), and 3.73 Nm in the Kuka robot relative to its internal torque sensors.
JcvPCA and JsvCRP : a set of metrics to evaluate changes in joint coordination strategies
Characterizing changes in inter-joint coordination presents significant challenges, as it necessitates the examination of relationships between multiple degrees of freedom during movements and their temporal evolution. Existing metrics are inadequate in providing physiologically coherent results that document both the temporal and spatial aspects of inter-joint coordination. In this article, we introduce two novel metrics to enhance the analysis of inter-joint coordination. The first metric, Joint Contribution Variation based on Principal Component Analysis (JcvPCA), evaluates the variation in each joint's contribution during series of movements. The second metric, Joint Synchronization Variation based on Continuous Relative Phase (JsvCRP), measures the variation in temporal synchronization among joints between two movement datasets. We begin by presenting each metric and explaining their derivation. We then demonstrate the application of these metrics using simulated and experimental datasets involving identical movement tasks performed with distinct coordination strategies. The results show that these metrics can successfully differentiate between unique coordination strategies, providing meaningful insights into joint collaboration during movement. These metrics hold significant potential for fields such as ergonomics and clinical rehabilitation, where a precise understanding of the evolution of inter-joint coordination strategies is crucial. Potential applications include evaluating the effects of upper limb exoskeletons in industrial settings or monitoring the progress of patients undergoing neurological rehabilitation.
Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning
This work presents a Hierarchical Multi-Agent Reinforcement Learning framework for analyzing simulated air combat scenarios involving heterogeneous agents. The objective is to identify effective Courses of Action that lead to mission success within preset simulations, thereby enabling the exploration of real-world defense scenarios at low cost and in a safe-to-fail setting. Applying deep Reinforcement Learning in this context poses specific challenges, such as complex flight dynamics, the exponential size of the state and action spaces in multi-agent systems, and the capability to integrate real-time control of individual units with look-ahead planning. To address these challenges, the decision-making process is split into two levels of abstraction: low-level policies control individual units, while a high-level commander policy issues macro commands aligned with the overall mission targets. This hierarchical structure facilitates the training process by exploiting policy symmetries of individual agents and by separating control from command tasks. The low-level policies are trained for individual combat control in a curriculum of increasing complexity. The high-level commander is then trained on mission targets given pre-trained control policies. The empirical validation confirms the advantages of the proposed framework.
comment: Published as journal chapter in Deep Learning Applications, Vol. 1, by Taylor & Francis
ChicGrasp: Imitation-Learning based Customized Dual-Jaw Gripper Control for Delicate, Irregular Bio-products Manipulation
Automated poultry processing lines still rely on humans to lift slippery, easily bruised carcasses onto a shackle conveyor. Deformability, anatomical variance, and strict hygiene rules make conventional suction and scripted motions unreliable. We present ChicGrasp, an end--to--end hardware--software co-design for this task. An independently actuated dual-jaw pneumatic gripper clamps both chicken legs, while a conditional diffusion-policy controller, trained from only 50 multi--view teleoperation demonstrations (RGB + proprioception), plans 5 DoF end--effector motion, which includes jaw commands in one shot. On individually presented raw broiler carcasses, our system achieves a 40.6\% grasp--and--lift success rate and completes the pick to shackle cycle in 38 s, whereas state--of--the--art implicit behaviour cloning (IBC) and LSTM-GMM baselines fail entirely. All CAD, code, and datasets will be open-source. ChicGrasp shows that imitation learning can bridge the gap between rigid hardware and variable bio--products, offering a reproducible benchmark and a public dataset for researchers in agricultural engineering and robot learning.
comment: Submitted for journal review
Multi-step manipulation task and motion planning guided by video demonstration
This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. Towards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later. We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. To demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (I) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa. For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP).
Parameter-Efficient Fine-Tuning of Vision Foundation Model for Forest Floor Segmentation from UAV Imagery ICRA
Unmanned Aerial Vehicles (UAVs) are increasingly used for reforestation and forest monitoring, including seed dispersal in hard-to-reach terrains. However, a detailed understanding of the forest floor remains a challenge due to high natural variability, quickly changing environmental parameters, and ambiguous annotations due to unclear definitions. To address this issue, we adapt the Segment Anything Model (SAM), a vision foundation model with strong generalization capabilities, to segment forest floor objects such as tree stumps, vegetation, and woody debris. To this end, we employ parameter-efficient fine-tuning (PEFT) to fine-tune a small subset of additional model parameters while keeping the original weights fixed. We adjust SAM's mask decoder to generate masks corresponding to our dataset categories, allowing for automatic segmentation without manual prompting. Our results show that the adapter-based PEFT method achieves the highest mean intersection over union (mIoU), while Low-rank Adaptation (LoRA), with fewer parameters, offers a lightweight alternative for resource-constrained UAV platforms.
comment: Accepted to the Novel Approaches for Precision Agriculture and Forestry with Autonomous Robots IEEE ICRA Workshop - 2025
Deep reinforcement learning-based longitudinal control strategy for automated vehicles at signalised intersections
Developing an autonomous vehicle control strategy for signalised intersections (SI) is one of the challenging tasks due to its inherently complex decision-making process. This study proposes a Deep Reinforcement Learning (DRL) based longitudinal vehicle control strategy at SI. A comprehensive reward function has been formulated with a particular focus on (i) distance headway-based efficiency reward, (ii) decision-making criteria during amber light, and (iii) asymmetric acceleration/ deceleration response, along with the traditional safety and comfort criteria. This reward function has been incorporated with two popular DRL algorithms, Deep Deterministic Policy Gradient (DDPG) and Soft-Actor Critic (SAC), which can handle the continuous action space of acceleration/deceleration. The proposed models have been trained on the combination of real-world leader vehicle (LV) trajectories and simulated trajectories generated using the Ornstein-Uhlenbeck (OU) process. The overall performance of the proposed models has been tested using Cumulative Distribution Function (CDF) plots and compared with the real-world trajectory data. The results show that the RL models successfully maintain lower distance headway (i.e., higher efficiency) and jerk compared to human-driven vehicles without compromising safety. Further, to assess the robustness of the proposed models, we evaluated the model performance on diverse safety-critical scenarios, in terms of car-following and traffic signal compliance. Both DDPG and SAC models successfully handled the critical scenarios, while the DDPG model showed smoother action profiles compared to the SAC model. Overall, the results confirm that DRL-based longitudinal vehicle control strategy at SI can help to improve traffic safety, efficiency, and comfort.
Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation
Autonomy in Minimally Invasive Robotic Surgery (MIRS) has the potential to reduce surgeon cognitive and task load, thereby increasing procedural efficiency. However, implementing accurate autonomous control can be difficult due to poor end-effector proprioception, a limitation of their cable-driven mechanisms. Although the robot may have joint encoders for the end-effector pose calculation, various non-idealities make the entire kinematics chain inaccurate. Modern vision-based pose estimation methods lack real-time capability or can be hard to train and generalize. In this work, we demonstrate a real-time capable, vision transformer-based pose estimation approach that is trained using end-to-end differentiable kinematics and rendering in simulation. We demonstrate the potential of this method to correct for noisy pose estimates in simulation, with the longer term goal of verifying the sim-to-real transferability of our approach.
Generative AI for Autonomous Driving: Frontiers and Opportunities
Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, particularly the pursuit of Level 5 autonomy. This survey delivers a comprehensive and critical synthesis of the emerging role of GenAI across the autonomous driving stack. We begin by distilling the principles and trade-offs of modern generative modeling, encompassing VAEs, GANs, Diffusion Models, and Large Language Models (LLMs). We then map their frontier applications in image, LiDAR, trajectory, occupancy, video generation as well as LLM-guided reasoning and decision making. We categorize practical applications, such as synthetic data workflows, end-to-end driving strategies, high-fidelity digital twin systems, smart transportation networks, and cross-domain transfer to embodied AI. We identify key obstacles and possibilities such as comprehensive generalization across rare cases, evaluation and safety checks, budget-limited implementation, regulatory compliance, ethical concerns, and environmental effects, while proposing research plans across theoretical assurances, trust metrics, transport integration, and socio-technical influence. By unifying these threads, the survey provides a forward-looking reference for researchers, engineers, and policymakers navigating the convergence of generative AI and advanced autonomous mobility. An actively maintained repository of cited works is available at https://github.com/taco-group/GenAI4AD.
Efficiently Manipulating Clutter via Learning and Search-Based Reasoning
This thesis presents novel algorithms to advance robotic object rearrangement, a critical task for autonomous systems in applications like warehouse automation and household assistance. Addressing challenges of high-dimensional planning, complex object interactions, and computational demands, our work integrates deep learning for interaction prediction, tree search for action sequencing, and parallelized computation for efficiency. Key contributions include the Deep Interaction Prediction Network (DIPN) for accurate push motion forecasting (over 90% accuracy), its synergistic integration with Monte Carlo Tree Search (MCTS) for effective non-prehensile object retrieval (100% completion in specific challenging scenarios), and the Parallel MCTS with Batched Simulations (PMBS) framework, which achieves substantial planning speed-up while maintaining or improving solution quality. The research further explores combining diverse manipulation primitives, validated extensively through simulated and real-world experiments.
comment: PhD Thesis of Baichuan Huang, written under the direction of Prof. Jingjin Yu
Aerial Path Online Planning for Urban Scene Updation SIGGRAPH 2025
We present the first scene-update aerial path planning algorithm specifically designed for detecting and updating change areas in urban environments. While existing methods for large-scale 3D urban scene reconstruction focus on achieving high accuracy and completeness, they are inefficient for scenarios requiring periodic updates, as they often re-explore and reconstruct entire scenes, wasting significant time and resources on unchanged areas. To address this limitation, our method leverages prior reconstructions and change probability statistics to guide UAVs in detecting and focusing on areas likely to have changed. Our approach introduces a novel changeability heuristic to evaluate the likelihood of changes, driving the planning of two flight paths: a prior path informed by static priors and a dynamic real-time path that adapts to newly detected changes. The framework integrates surface sampling and candidate view generation strategies, ensuring efficient coverage of change areas with minimal redundancy. Extensive experiments on real-world urban datasets demonstrate that our method significantly reduces flight time and computational overhead, while maintaining high-quality updates comparable to full-scene re-exploration and reconstruction. These contributions pave the way for efficient, scalable, and adaptive UAV-based scene updates in complex urban environments.
comment: ACM SIGGRAPH 2025 (Patent Protected); Project page: https://vcc.tech/research/2025/DroneUpdate ; Video: https://youtu.be/3nNBoh2syTU?si=LQ04NlJNyPosdm8F
Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation ICML 2025
Behavior Cloning (BC) is a widely adopted visual imitation learning method in robot manipulation. Current BC approaches often enhance generalization by leveraging large datasets and incorporating additional visual and textual modalities to capture more diverse information. However, these methods overlook whether the learned representations contain redundant information and lack a solid theoretical foundation to guide the learning process. To address these limitations, we adopt an information-theoretic perspective and introduce mutual information to quantify and mitigate redundancy in latent representations. Building on this, we incorporate the Information Bottleneck (IB) principle into BC, which extends the idea of reducing redundancy by providing a structured framework for compressing irrelevant information while preserving task-relevant features. This work presents the first comprehensive study on redundancy in latent representations across various methods, backbones, and experimental settings, while extending the generalizability of the IB to BC. Extensive experiments and analyses on the CortexBench and LIBERO benchmarks demonstrate significant performance improvements with IB, underscoring the importance of reducing input data redundancy and highlighting its practical value for more practical applications. Project Page: https://baishuanghao.github.io/BC-IB.github.io.
comment: Accepted by ICML 2025
CHD: Coupled Hierarchical Diffusion for Long-Horizon Tasks
Diffusion-based planners have shown strong performance in short-horizon tasks but often fail in complex, long-horizon settings. We trace the failure to loose coupling between high-level (HL) sub-goal selection and low-level (LL) trajectory generation, which leads to incoherent plans and degraded performance. We propose Coupled Hierarchical Diffusion (CHD), a framework that models HL sub-goals and LL trajectories jointly within a unified diffusion process. A shared classifier passes LL feedback upstream so that sub-goals self-correct while sampling proceeds. This tight HL-LL coupling improves trajectory coherence and enables scalable long-horizon diffusion planning. Experiments across maze navigation, tabletop manipulation, and household environments show that CHD consistently outperforms both flat and hierarchical diffusion baselines. Our website is: https://sites.google.com/view/chd2025/home
PRIMER: Perception-Aware Robust Learning-based Multiagent Trajectory Planner
In decentralized multiagent trajectory planners, agents need to communicate and exchange their positions to generate collision-free trajectories. However, due to localization errors/uncertainties, trajectory deconfliction can fail even if trajectories are perfectly shared between agents. To address this issue, we first present PARM and PARM*, perception-aware, decentralized, asynchronous multiagent trajectory planners that enable a team of agents to navigate uncertain environments while deconflicting trajectories and avoiding obstacles using perception information. PARM* differs from PARM as it is less conservative, using more computation to find closer-to-optimal solutions. While these methods achieve state-of-the-art performance, they suffer from high computational costs as they need to solve large optimization problems onboard, making it difficult for agents to replan at high rates. To overcome this challenge, we present our second key contribution, PRIMER, a learning-based planner trained with imitation learning (IL) using PARM* as the expert demonstrator. PRIMER leverages the low computational requirements at deployment of neural networks and achieves a computation speed up to 5500 times faster than optimization-based approaches.
comment: 7 pages, 3 figures
Visual Imitation Enables Contextual Humanoid Control
How can we teach humanoids to climb staircases and sit on chairs using the surrounding environment context? Arguably, the simplest way is to just show them-casually capture a human motion video and feed it to humanoids. We introduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everyday videos, jointly reconstructs the humans and the environment, and produces whole-body control policies for humanoid robots that perform the corresponding skills. We demonstrate the results of our pipeline on real humanoid robots, showing robust, repeatable contextual control such as staircase ascents and descents, sitting and standing from chairs and benches, as well as other dynamic whole-body skills-all from a single policy, conditioned on the environment and global root commands. VIDEOMIMIC offers a scalable path towards teaching humanoids to operate in diverse real-world environments.
comment: Project website: https://www.videomimic.net/
EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners
To handle the complexities of real-world traffic, learning planners for self-driving from data is a promising direction. While recent approaches have shown great progress, they typically assume a setting in which the ground-truth world state is available as input. However, when deployed, planning needs to be robust to the long-tail of errors incurred by a noisy perception system, which is often neglected in evaluation. To address this, previous work has proposed drawing adversarial samples from a perception error model (PEM) mimicking the noise characteristics of a target object detector. However, these methods use simple PEMs that fail to accurately capture all failure modes of detection. In this paper, we present EMPERROR, a novel transformer-based generative PEM, apply it to stress-test an imitation learning (IL)-based planner and show that it imitates modern detectors more faithfully than previous work. Furthermore, it is able to produce realistic noisy inputs that increase the planner's collision rate by up to 85%, demonstrating its utility as a valuable tool for a more complete evaluation of self-driving planners.
comment: Project page: https://lasnik.github.io/emperror/
Nano Drone-based Indoor Crime Scene Analysis
Technologies such as robotics, Artificial Intelligence (AI), and Computer Vision (CV) can be applied to crime scene analysis (CSA) to help protect lives, facilitate justice, and deter crime, but an overview of the tasks that can be automated has been lacking. Here we follow a speculative prototyping approach: First, the STAIR tool is used to rapidly review the literature and identify tasks that seem to have not received much attention, like accessing crime scenes through a window, mapping/gathering evidence, and analyzing blood smears. Secondly, we present a prototype of a small drone that implements these three tasks with 75%, 85%, and 80% performance, to perform a minimal analysis of an indoor crime scene. Lessons learned are reported, toward guiding next work.
comment: 8 pages, 6 figures, 3 tables, accepted at ARSO 2025
Optimization-free Smooth Control Barrier Function for Polygonal Collision Avoidance
Polygonal collision avoidance (PCA) is short for the problem of collision avoidance between two polygons (i.e., polytopes in planar) that own their dynamic equations. This problem suffers the inherent difficulty in dealing with non-smooth boundaries and recently optimization-defined metrics, such as signed distance field (SDF) and its variants, have been proposed as control barrier functions (CBFs) to tackle PCA problems. In contrast, we propose an optimization-free smooth CBF method in this paper, which is computationally efficient and proved to be nonconservative. It is achieved by three main steps: a lower bound of SDF is expressed as a nested Boolean logic composition first, then its smooth approximation is established by applying the latest log-sum-exp method, after which a specified CBF-based safety filter is proposed to address this class of problems. To illustrate its wide applications, the optimization-free smooth CBF method is extended to solve distributed collision avoidance of two underactuated nonholonomic vehicles and drive an underactuated container crane to avoid a moving obstacle respectively, for which numerical simulations are also performed.
Towards Anytime Optical Flow Estimation with Event Cameras
Event cameras respond to changes in log-brightness at the millisecond level, making them ideal for optical flow estimation. However, existing datasets from event cameras provide only low frame rate ground truth for optical flow, limiting the research potential of event-driven optical flow. To address this challenge, we introduce a low-latency event representation, Unified Voxel Grid, and propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. Furthermore, we propose the Rectified Flow Warp Loss (RFWL) for the unsupervised assessment of intermediate optical flow. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.
comment: Accepted to Sensors. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor control and instruction comprehension through end-to-end learning processes. However, current VLA models face significant challenges: they are slow during inference and require extensive pre-training on large amounts of robotic data, making real-world deployment difficult. In this paper, we introduce a new family of compact vision-language-action models, called TinyVLA, which offers two key advantages over existing VLA models: (1) faster inference speeds, and (2) improved data efficiency, eliminating the need for pre-training stage. Our framework incorporates two essential components to build TinyVLA: (1) initializing the policy backbone with robust, high-speed multimodal models, and (2) integrating a diffusion policy decoder during fine-tuning to enable precise robot actions. We conducted extensive evaluations of TinyVLA in both simulation and on real robots, demonstrating that our approach significantly outperforms the state-of-the-art VLA model, OpenVLA, in terms of speed and data efficiency, while delivering comparable or superior performance. Additionally, TinyVLA exhibits strong generalization capabilities across various dimensions, including language instructions, novel objects, unseen positions, changes in object appearance, background variations, and environmental shifts, often matching or exceeding the performance of OpenVLA. We believe that \methodname offers an interesting perspective on utilizing pre-trained multimodal models for policy learning. Our project is at https://tiny-vla.github.io.
comment: add more citations
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
Enabling robots to perform diverse tasks across varied environments is a central challenge in robot learning. While vision-language-action (VLA) models have shown promise for generalizable robot skills, realizing their full potential requires addressing limitations in action representation and efficient training. Current VLA models often focus on scaling the vision-language model (VLM) component, while the action space representation remains a critical bottleneck. This paper introduces DexVLA, a novel framework designed to enhance the efficiency and generalization capabilities of VLAs for complex, long-horizon tasks across diverse robot embodiments. DexVLA features a novel diffusion-based action expert, scaled to one billion parameters, designed for cross-embodiment learning. A novel embodiment curriculum learning strategy facilitates efficient training: (1) pre-training the diffusion expert that is separable from the VLA on cross-embodiment data, (2) aligning the VLA model to specific embodiments, and (3) post-training for rapid adaptation to new tasks. We conduct comprehensive experiments across multiple embodiments, including single-arm, bimanual, and dexterous hand, demonstrating DexVLA's adaptability to challenging tasks without task-specific adaptation, its ability to learn dexterous skills on novel embodiments with limited data, and its capacity to complete complex, long-horizon tasks using only direct language prompting, such as laundry folding. In all settings, our method demonstrates superior performance compared to state-of-the-art models like Octo, OpenVLA, and Diffusion Policy.
comment: The webpage is at https://dex-vla.github.io/
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
In this paper, we present DiffusionVLA, a novel framework that seamlessly combines the autoregression model with the diffusion model for learning visuomotor policy. Central to our approach is a next-token prediction objective, enabling the model to reason effectively over the user's query in the context of current observations. Subsequently, a diffusion model is attached to generate robust action outputs. To enhance policy learning through self-reasoning, we introduce a novel reasoning injection module that integrates reasoning phrases directly into the policy learning process. The whole framework is simple and flexible, making it easy to deploy and upgrade. We conduct extensive experiments using multiple real robots to validate the effectiveness of DiffusionVLA. Our tests include a challenging factory sorting task, where DiffusionVLA successfully categorizes objects, including those not seen during training. We observe that the reasoning module makes the model interpretable. It allows observers to understand the model thought process and identify potential causes of policy failures. Additionally, we test DiffusionVLA on a zero-shot bin-picking task, achieving 63.7\% accuracy on 102 previously unseen objects. Our method demonstrates robustness to visual changes, such as distractors and new backgrounds, and easily adapts to new embodiments. Furthermore, DiffusionVLA can follow novel instructions and retain conversational ability. Notably, DiffusionVLA is data-efficient and fast at inference; our smallest DiffusionVLA-2B runs 82Hz on a single A6000 GPU and can train from scratch on less than 50 demonstrations for a complex task. Finally, we scale the model from 2B to 72B parameters, showcasing improved generalization capabilities with increased model size.
comment: The project page is available at: http://diffusion-vla.github.io
iA$^*$: Imperative Learning-based A$^*$ Search for Path Planning
The pathfinding problem, which aims to identify a collision-free path between two points, is crucial for many applications, such as robot navigation and autonomous driving. Classic methods, such as A$^*$ search, perform well on small-scale maps but face difficulties scaling up. Conversely, data-driven approaches can improve pathfinding efficiency but require extensive data labeling and lack theoretical guarantees, making it challenging for practical applications. To combine the strengths of the two methods, we utilize the imperative learning (IL) strategy and propose a novel self-supervised pathfinding framework, termed imperative learning-based A$^*$ (iA$^*$). Specifically, iA$^*$ is a bilevel optimization process where the lower-level optimization is dedicated to finding the optimal path by a differentiable A$^*$ search module, and the upper-level optimization narrows down the search space to improve efficiency via setting suitable initial values from a data-driven model. Besides, the model within the upper-level optimization is a fully convolutional network, trained by the calculated loss in the lower-level optimization. Thus, the framework avoids extensive data labeling and can be applied in diverse environments. Our comprehensive experiments demonstrate that iA$^*$ surpasses both classical and data-driven methods in pathfinding efficiency and shows superior robustness among different tasks, validated with public datasets and simulation environments.
ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Using Lightweight Polygon Maps
This paper presents an effective and reliable pose tracking solution, termed ERPoT, for mobile robots operating in large-scale outdoor and challenging indoor environments, underpinned by an innovative prior polygon map. Especially, to overcome the challenge that arises as the map size grows with the expansion of the environment, the novel form of a prior map composed of multiple polygons is proposed. Benefiting from the use of polygons to concisely and accurately depict environmental occupancy, the prior polygon map achieves long-term reliable pose tracking while ensuring a compact form. More importantly, pose tracking is carried out under pure LiDAR mode, and the dense 3D point cloud is transformed into a sparse 2D scan through ground removal and obstacle selection. On this basis, a novel cost function for pose estimation through point-polygon matching is introduced, encompassing two distinct constraint forms: point-to-vertex and point-to-edge. In this study, our primary focus lies on two crucial aspects: lightweight and compact prior map construction, as well as effective and reliable robot pose tracking. Both aspects serve as the foundational pillars for future navigation across diverse mobile platforms equipped with different LiDAR sensors in varied environments. Comparative experiments based on the publicly available datasets and our self-recorded datasets are conducted, and evaluation results show the superior performance of ERPoT on reliability, prior map size, pose estimation error, and runtime over the other six approaches. The corresponding code can be accessed at https://github.com/ghm0819/ERPoT, and the supplementary video is at https://youtu.be/cseml5FrW1Q.
comment: 20 pages, 21 figures
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation
The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach.
comment: HRI 2025
UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue
Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV-VLRR (Vision-Language-Rapid-Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. 2) A non-linearmodel predictive control (NMPC) with built-in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Video of UAV-VLRR: https://youtu.be/KJqQGKKt1xY
comment: UAV-VLRR
Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information
Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient and accurate SCR system. Compared to existing SCR methods, we propose a unified architecture for both scene encoding and salient keypoint detection, allowing our system to prioritize the encoding of informative regions. This design significantly improves computational efficiency. Additionally, we introduce a mechanism that utilizes sequential information during both mapping and relocalization. The proposed method enhances the implicit triangulation, especially in environments with repetitive textures. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.
comment: 8 pages, 6 figures
Enhanced Importance Sampling through Latent Space Exploration in Normalizing Flows AAAI 2025
Importance sampling is a rare event simulation technique used in Monte Carlo simulations to bias the sampling distribution towards the rare event of interest. By assigning appropriate weights to sampled points, importance sampling allows for more efficient estimation of rare events or tails of distributions. However, importance sampling can fail when the proposal distribution does not effectively cover the target distribution. In this work, we propose a method for more efficient sampling by updating the proposal distribution in the latent space of a normalizing flow. Normalizing flows learn an invertible mapping from a target distribution to a simpler latent distribution. The latent space can be more easily explored during the search for a proposal distribution, and samples from the proposal distribution are recovered in the space of the target distribution via the invertible mapping. We empirically validate our methodology on simulated robotics applications such as autonomous racing and aircraft ground collision avoidance.
comment: Accepted at AAAI 2025
Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions
Functional grasp is essential for enabling dexterous multi-finger robot hands to manipulate objects effectively. However, most prior work either focuses on power grasping, which simply involves holding an object still, or relies on costly teleoperated robot demonstrations to teach robots how to grasp each object functionally. Instead, we propose extracting human grasp information from web images since they depict natural and functional object interactions, thereby bypassing the need for curated demonstrations. We reconstruct human hand-object interaction (HOI) 3D meshes from RGB images, retarget the human hand to multi-finger robot hands, and align the noisy object mesh with its accurate 3D shape. We show that these relatively low-quality HOI data from inexpensive web sources can effectively train a functional grasping model. To further expand the grasp dataset for seen and unseen objects, we use the initially-trained grasping policy with web data in the IsaacGym simulator to generate physically feasible grasps while preserving functionality. We train the grasping model on 10 object categories and evaluate it on 9 unseen objects, including challenging items such as syringes, pens, spray bottles, and tongs, which are underrepresented in existing datasets. The model trained on the web HOI dataset, achieving a 75.8% success rate on seen objects and 61.8% across all objects in simulation, with a 6.7% improvement in success rate and a 1.8x increase in functionality ratings over baselines. Simulator-augmented data further boosts performance from 61.8% to 83.4%. The sim-to-real transfer to the LEAP Hand achieves a 85% success rate. Project website is at: https://web2grasp.github.io/.
Self-Supervised Learning for Robotic Leaf Manipulation: A Hybrid Geometric-Neural Approach
Automating leaf manipulation in agricultural settings faces significant challenges, including the variability of plant morphologies and deformable leaves. We propose a novel hybrid geometric-neural approach for autonomous leaf grasping that combines traditional computer vision with neural networks through self-supervised learning. Our method integrates YOLOv8 for instance segmentation and RAFT-Stereo for 3D depth estimation to build rich leaf representations, which feed into both a geometric feature scoring pipeline and a neural refinement module (GraspPointCNN). The key innovation is our confidence-weighted fusion mechanism that dynamically balances the contribution of each approach based on prediction certainty. Our self-supervised framework uses the geometric pipeline as an expert teacher to automatically generate training data. Experiments demonstrate that our approach achieves an 88.0% success rate in controlled environments and 84.7% in real greenhouse conditions, significantly outperforming both purely geometric (75.3%) and neural (60.2%) methods. This work establishes a new paradigm for agricultural robotics where domain expertise is seamlessly integrated with machine learning capabilities, providing a foundation for fully automated crop monitoring systems.
comment: 13 pages, 9 figures
DiffOG: Differentiable Policy Trajectory Optimization with Generalizability
Imitation learning-based visuomotor policies excel at manipulation tasks but often produce suboptimal action trajectories compared to model-based methods. Directly mapping camera data to actions via neural networks can result in jerky motions and difficulties in meeting critical constraints, compromising safety and robustness in real-world deployment. For tasks that require high robustness or strict adherence to constraints, ensuring trajectory quality is crucial. However, the lack of interpretability in neural networks makes it challenging to generate constraint-compliant actions in a controlled manner. This paper introduces differentiable policy trajectory optimization with generalizability (DiffOG), a learning-based trajectory optimization framework designed to enhance visuomotor policies. By leveraging the proposed differentiable formulation of trajectory optimization with transformer, DiffOG seamlessly integrates policies with a generalizable optimization layer. DiffOG refines action trajectories to be smoother and more constraint-compliant while maintaining alignment with the original demonstration distribution, thus avoiding degradation in policy performance. We evaluated DiffOG across 11 simulated tasks and 2 real-world tasks. The results demonstrate that DiffOG significantly enhances the trajectory quality of visuomotor policies while having minimal impact on policy performance, outperforming trajectory processing baselines such as greedy constraint clipping and penalty-based trajectory optimization. Furthermore, DiffOG achieves superior performance compared to existing constrained visuomotor policy. Please visit the project website for more details: https://zhengtongxu.github.io/diffog-website/.
SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment
In battlefield environments, adversaries frequently disrupt GPS signals, requiring alternative localization and navigation methods. Traditional vision-based approaches like Simultaneous Localization and Mapping (SLAM) and Visual Odometry (VO) involve complex sensor fusion and high computational demand, whereas range-free methods like DV-HOP face accuracy and stability challenges in sparse, dynamic networks. This paper proposes LanBLoc-BMM, a navigation approach using landmark-based localization (LanBLoc) combined with a battlefield-specific motion model (BMM) and Extended Kalman Filter (EKF). Its performance is benchmarked against three state-of-the-art visual localization algorithms integrated with BMM and Bayesian filters, evaluated on synthetic and real-imitated trajectory datasets using metrics including Average Displacement Error (ADE), Final Displacement Error (FDE), and a newly introduced Average Weighted Risk Score (AWRS). LanBLoc-BMM (with EKF) demonstrates superior performance in ADE, FDE, and AWRS on real-imitated datasets. Additionally, two safe navigation methods, SafeNav-CHull and SafeNav-Centroid, are introduced by integrating LanBLoc-BMM(EKF) with a novel Risk-Aware RRT* (RAw-RRT*) algorithm for obstacle avoidance and risk exposure minimization. Simulation results in battlefield scenarios indicate SafeNav-Centroid excels in accuracy, risk exposure, and trajectory efficiency, while SafeNav-CHull provides superior computational speed.
comment: 10 pages, conference paper. arXiv admin note: text overlap with arXiv:2402.14280
DiffCloud: Real-to-Sim from Point Clouds with Differentiable Simulation and Rendering of Deformable Objects
Research in manipulation of deformable objects is typically conducted on a limited range of scenarios, because handling each scenario on hardware takes significant effort. Realistic simulators with support for various types of deformations and interactions have the potential to speed up experimentation with novel tasks and algorithms. However, for highly deformable objects it is challenging to align the output of a simulator with the behavior of real objects. Manual tuning is not intuitive, hence automated methods are needed. We view this alignment problem as a joint perception-inference challenge and demonstrate how to use recent neural network architectures to successfully perform simulation parameter inference from real point clouds. We analyze the performance of various architectures, comparing their data and training requirements. Furthermore, we propose to leverage differentiable point cloud sampling and differentiable simulation to significantly reduce the time to achieve the alignment. We employ an efficient way to propagate gradients from point clouds to simulated meshes and further through to the physical simulation parameters, such as mass and stiffness. Experiments with highly deformable objects show that our method can achieve comparable or better alignment with real object behavior, while reducing the time needed to achieve this by more than an order of magnitude. Videos and supplementary material are available at https://diffcloud.github.io.
Distributed Certifiably Correct Range-Aided SLAM ICRA
Reliable simultaneous localization and mapping (SLAM) algorithms are necessary for safety-critical autonomous navigation. In the communication-constrained multi-agent setting, navigation systems increasingly use point-to-point range sensors as they afford measurements with low bandwidth requirements and known data association. The state estimation problem for these systems takes the form of range-aided (RA) SLAM. However, distributed algorithms for solving the RA-SLAM problem lack formal guarantees on the quality of the returned estimate. To this end, we present the first distributed algorithm for RA-SLAM that can efficiently recover certifiably globally optimal solutions. Our algorithm, distributed certifiably correct RA-SLAM (DCORA), achieves this via the Riemannian Staircase method, where computational procedures developed for distributed certifiably correct pose graph optimization are generalized to the RA-SLAM problem. We demonstrate DCORA's efficacy on real-world multi-agent datasets by achieving absolute trajectory errors comparable to those of a state-of-the-art centralized certifiably correct RA-SLAM algorithm. Additionally, we perform a parametric study on the structure of the RA-SLAM problem using synthetic data, revealing how common parameters affect DCORA's performance.
comment: 8 pages, 3 figures, accepted to the 2025 International Conference on Robotics and Automation (ICRA). This version includes minor clerical edits to the published version in the conference proceedings
Optimal navigation of magnetic artificial microswimmers in blood capillaries with deep reinforcement learning
Biomedical applications such as targeted drug delivery, microsurgery, and sensing rely on reaching precise areas within the body in a minimally invasive way. Artificial bacterial flagella (ABFs) have emerged as potential tools for this task by navigating through the circulatory system with the help of external magnetic fields. While their swimming characteristics are well understood in simple settings, their controlled navigation through realistic capillary networks remains a significant challenge due to the complexity of blood flow and the high computational cost of detailed simulations. We address this challenge by conducting numerical simulations of ABFs in retinal capillaries, propelled by an external magnetic field. The simulations are based on a validated blood model that predicts the dynamics of individual red blood cells and their hydrodynamic interactions with ABFs. The magnetic field follows a control policy that brings the ABF to a prescribed target. The control policy is learned with an actor-critic, off-policy reinforcement learning algorithm coupled with a reduced-order model of the system. We show that the same policy robustly guides the ABF to a prescribed target in both the reduced-order model and the fine-grained blood simulations. This approach is suitable for designing robust control policies for personalized medicine at moderate computational cost.
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually-crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We systematically evaluate our method on simulation and real-world tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion
Multiagent Systems
Scalable UAV Multi-Hop Networking via Multi-Agent Reinforcement Learning with Large Language Models
In disaster scenarios, establishing robust emergency communication networks is critical, and unmanned aerial vehicles (UAVs) offer a promising solution to rapidly restore connectivity. However, organizing UAVs to form multi-hop networks in large-scale dynamic environments presents significant challenges, including limitations in algorithmic scalability and the vast exploration space required for coordinated decision-making. To address these issues, we propose MRLMN, a novel framework that integrates multi-agent reinforcement learning (MARL) and large language models (LLMs) to jointly optimize UAV agents toward achieving optimal networking performance. The framework incorporates a grouping strategy with reward decomposition to enhance algorithmic scalability and balance decision-making across UAVs. In addition, behavioral constraints are applied to selected key UAVs to improve the robustness of the network. Furthermore, the framework integrates LLM agents, leveraging knowledge distillation to transfer their high-level decision-making capabilities to MARL agents. This enhances both the efficiency of exploration and the overall training process. In the distillation module, a Hungarian algorithm-based matching scheme is applied to align the decision outputs of the LLM and MARL agents and define the distillation loss. Extensive simulation results validate the effectiveness of our approach, demonstrating significant improvements in network performance, including enhanced coverage and communication quality.
Benchmarking AI scientists in omics data-driven biological research
The rise of large language models and multi-agent systems has sparked growing interest in AI scientists capable of autonomous biological research. However, existing benchmarks either focus on reasoning without data or on data analysis with predefined statistical answers, lacking realistic, data-driven evaluation settings. Here, we introduce the Biological AI Scientist Benchmark (BaisBench), a benchmark designed to assess AI scientists' ability to generate biological discoveries through data analysis and reasoning with external knowledge. BaisBench comprises two tasks: cell type annotation on 31 expert-labeled single-cell datasets, and scientific discovery through answering 198 multiple-choice questions derived from the biological insights of 41 recent single-cell studies. Systematic experiments on state-of-the-art AI scientists and LLM agents showed that while promising, current models still substantially underperform human experts on both tasks. We hope BaisBench will fill this gap and serve as a foundation for advancing and evaluating AI models for scientific discovery. The benchmark can be found at: https://github.com/EperLuo/BaisBench.
Reciprocity as the Foundational Substrate of Society: How Reciprocal Dynamics Scale into Social Systems
A major bottleneck in multi-agent AI is the lack of simulateable models for the bottom-up emergence of social structure under realistic behavioral constraints. Similarly, many foundational theories in economics and sociology including the concepts of "institutions" and "norms" tend to describe social structures post hoc, often relying on implicit assumptions of shared culture, morality, or symbolic agreement. These concepts are often treated as primitives rather than reconstructed from agent-level behavior, leaving both their origins and operational definitions under-specified. To address this, we propose a three-stage bottom-up framework: Reciprocal Dynamics, capturing individual-level reciprocal exchanges; Norm Stabilization, the consolidation of shared expectations; and Institutional Construction, the externalization of stable patterns into scalable structures. By grounding social emergence in agent-level reciprocity, our framework enables the systematic exploration of how moral, cultural, and institutional structures emerge from cognitively minimal interactions.
comment: First draft extending the first position paper. Main framework complete; historical examples and references will be updated
Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations
We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running the atomistic simulations, monitoring their computation status, analyzing the simulation results, and summarizing them for the user in text and graphical forms. We achieve these goals by exploiting fine-tuned open-source large language models (LLMs), rule-based agents, and a retrieval-augmented generation (RAG) system. Aitomia leverages the versatility of our MLatom ecosystem for AI-enhanced computational chemistry. This intelligent assistant is going to be integrated into the Aitomistic Hub and XACS online computing services, with some functionality already publicly available as described at http://mlatom.com/aitomia. Aitomia is expected to lower the barrier to performing atomistic simulations, accelerating research and development in the relevant fields.
Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning
This work presents a Hierarchical Multi-Agent Reinforcement Learning framework for analyzing simulated air combat scenarios involving heterogeneous agents. The objective is to identify effective Courses of Action that lead to mission success within preset simulations, thereby enabling the exploration of real-world defense scenarios at low cost and in a safe-to-fail setting. Applying deep Reinforcement Learning in this context poses specific challenges, such as complex flight dynamics, the exponential size of the state and action spaces in multi-agent systems, and the capability to integrate real-time control of individual units with look-ahead planning. To address these challenges, the decision-making process is split into two levels of abstraction: low-level policies control individual units, while a high-level commander policy issues macro commands aligned with the overall mission targets. This hierarchical structure facilitates the training process by exploiting policy symmetries of individual agents and by separating control from command tasks. The low-level policies are trained for individual combat control in a curriculum of increasing complexity. The high-level commander is then trained on mission targets given pre-trained control policies. The empirical validation confirms the advantages of the proposed framework.
comment: Published as journal chapter in Deep Learning Applications, Vol. 1, by Taylor & Francis
Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?
Centralized Training with Decentralized Execution (CTDE) has recently emerged as a popular framework for cooperative Multi-Agent Reinforcement Learning (MARL), where agents can use additional global state information to guide training in a centralized way and make their own decisions only based on decentralized local policies. Despite the encouraging results achieved, CTDE makes an independence assumption on agent policies, which limits agents to adopt global cooperative information from each other during centralized training. Therefore, we argue that existing CTDE methods cannot fully utilize global information for training, leading to an inefficient joint-policy exploration and even suboptimal results. In this paper, we introduce a novel Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning, that not only enables an efficacious message exchange among agents during training but also guarantees the independent policies for execution. Firstly, CADP endows agents the explicit communication channel to seek and take advices from different agents for more centralized training. To further ensure the decentralized execution, we propose a smooth model pruning mechanism to progressively constraint the agent communication into a closed one without degradation in agent cooperation capability. Empirical evaluations on StarCraft II micromanagement and Google Research Football benchmarks demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code will be made publicly available.
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation NAACL 2025
Existing Multimodal Large Language Model (MLLM)-based agents face significant challenges in handling complex GUI (Graphical User Interface) interactions on devices. These challenges arise from the dynamic and structured nature of GUI environments, which integrate text, images, and spatial relationships, as well as the variability in action spaces across different pages and tasks. To address these limitations, we propose MobA, a novel MLLM-based mobile assistant system. MobA introduces an adaptive planning module that incorporates a reflection mechanism for error recovery and dynamically adjusts plans to align with the real environment contexts and action module's execution capacity. Additionally, a multifaceted memory module provides comprehensive memory support to enhance adaptability and efficiency. We also present MobBench, a dataset designed for complex mobile interactions. Experimental results on MobBench and AndroidArena demonstrate MobA's ability to handle dynamic GUI environments and perform complex mobile tasks.
comment: NAACL 2025 Demo Track [code] https://github.com/OpenDFM/MobA [dataset] https://huggingface.co/datasets/OpenDFM/MobA-MobBench
Capability-Aware Shared Hypernetworks for Flexible Heterogeneous Multi-Robot Coordination
Recent advances have enabled heterogeneous multi-robot teams to learn complex and effective coordination skills. However, existing neural architectures that support heterogeneous teaming tend to force a trade-off between expressivity and efficiency. Shared-parameter designs prioritize sample efficiency by enabling a single network to be shared across all or a pre-specified subset of robots (via input augmentations), but tend to limit behavioral diversity. In contrast, recent designs employ a separate policy for each robot, enabling greater diversity and expressivity at the cost of efficiency and generalization. Our key insight is that such tradeoffs can be avoided by viewing these design choices as ends of a broad spectrum. Inspired by recent work in transfer and meta learning, and building on prior work in multi-robot task allocation, we propose Capability-Aware Shared Hypernetworks (CASH), a soft weight sharing architecture that uses hypernetworks to efficiently learn a flexible shared policy that dynamically adapts to each robot post-training. By explicitly encoding the impact of robot capabilities (e.g., speed and payload) on collective behavior, CASH enables zero-shot generalization to unseen robots or team compositions. Our experiments involve multiple heterogeneous tasks, three learning paradigms (imitation learning, value-based, and policy-gradient RL), and SOTA multi-robot simulation (JaxMARL) and hardware (Robotarium) platforms. Across all conditions, we find that CASH generates appropriately-diverse behaviors and consistently outperforms baseline architectures in terms of performance and sample efficiency during both training and zero-shot generalization, all with 60%-80% fewer learnable parameters.
comment: 22 pages, 8 figures, equal authorship between Kevin Fu and Shalin Anand Jain
Systems and Control (CS)
Load-independent Metrics for Benchmarking Force Controllers
Torque-controlled actuators are critical components in mechatronic systems that closely interact with their environment, such as legged robots, collaborative manipulators, and exoskeletons. The performance and stability of these actuators depend not only on controller design and system dynamics but also significantly on load characteristics, which may include interactions with humans or unstructured environments. This load dependence highlights the need for frameworks that properly assess and compare torque controllers independent of specific loading conditions. In this short paper, we concisely present a modeling approach that captures the impact of load on the closed-loop dynamics of torque-controlled systems. Based on this model, we propose new methods and quantitative metrics, including the Passivity Index Interval, which blends passivity and small-gain theory to offer a less conservative measure of coupled stability than passivity alone. These metrics can be used alongside traditional control performance indicators, such as settling time and bandwidth, to provide a more comprehensive characterization of torque-controlled systems. We demonstrate the application of the proposed metrics through experimental comparisons of linear actuator force controllers.
Optimal Trajectory Planning with Collision Avoidance for Autonomous Vehicle Maneuvering
To perform autonomous driving maneuvers, such as parallel or perpendicular parking, a vehicle requires continual speed and steering adjustments to follow a generated path. In consequence, the path's quality is a limiting factor of the vehicle maneuver's performance. While most path planning approaches include finding a collision-free route, optimal trajectory planning involves solving the best transition from initial to final states, minimizing the action over all paths permitted by a kinematic model. Here we propose a novel method based on sequential convex optimization, which permits flexible and efficient optimal trajectory generation. The objective is to achieve the fastest time, shortest distance, and fewest number of path segments to satisfy motion requirements, while avoiding sensor blind-spots. In our approach, vehicle kinematics are represented by a discretized Dubins model. To avoid collisions, each waypoint is constrained by linear inequalities representing closest distance of obstacles to a polygon specifying the vehicle's extent. To promote smooth and valid trajectories, the solved kinematic state and control variables are constrained and regularized by penalty terms in the model's cost function, which enforces physical restrictions including limits for steering angle, acceleration and speed. In this paper, we analyze trajectories obtained for several parking scenarios. Results demonstrate efficient and collision-free motion generated by the proposed technique.
comment: SIAM Conference on Control and Its Applications, July 28-30th, 2025, Montreal, Canada
Joint Communication Scheduling and Resource Allocation for Distributed Edge Learning: Seamless Integration in Next-Generation Wireless Networks
Distributed edge learning (DL) is considered a cornerstone of intelligence enablers, since it allows for collaborative training without the necessity for local clients to share raw data with other parties, thereby preserving privacy and security. Integrating DL into the 6G networks requires a coexistence design with existing services such as high-bandwidth (HB) traffic like eMBB. Current designs in the literature mainly focus on communication round-wise designs that assume a rigid resource allocation throughout each communication round (CR). However, rigid resource allocation within a CR is a highly inefficient and inaccurate representation of the system's realistic behavior. This is due to the heterogeneous nature of the system, as clients inherently may need to access the network at different times. This work zooms into one arbitrary CR, and demonstrates the importance of considering a time-dependent resource sharing design with HB traffic. We first formulate a time-step-wise optimization problem to minimize the consumed time by DL within the CR while constrained by a DL energy budget. Due to its intractability, a session-based optimization problem is formulated assuming a CR lasts less than a large-scale coherence time. Some scheduling properties of such multi-server joint communication scheduling and resource allocation framework have been established. An iterative algorithm has been designed to solve such non-convex and non-block-separable-constrained problems. Simulation results confirm the importance of the efficient and accurate integration design proposed in this work.
comment: This work has been submitted to the IEEE for possible publication
Robust Indoor Localization via Conformal Methods and Variational Bayesian Adaptive Filtering
Indoor localization is critical for IoT applications, yet challenges such as non-Gaussian noise, environmental interference, and measurement outliers hinder the robustness of traditional methods. Existing approaches, including Kalman filtering and its variants, often rely on Gaussian assumptions or static thresholds, limiting adaptability in dynamic environments. This paper proposes a hierarchical robust framework integrating Variational Bayesian (VB) parameter learning, Huber M-estimation, and Conformal Outlier Detection (COD) to address these limitations. First, VB inference jointly estimates state and noise parameters, adapting to time-varying uncertainties. Second, Huber-based robust filtering suppresses mild outliers while preserving Gaussian efficiency. Third, COD provides statistical guarantees for outlier detection via dynamically calibrated thresholds, ensuring a user-controlled false alarm rate. Theoretically, we prove the Semi-positive Definiteness of Huber-based Kalman filtering covariance and the coverage of sliding window conformal prediction. Experiments on geomagnetic fingerprint datasets demonstrate significant improvements: fingerprint matching accuracy increases from 81.25% to 93.75%, and positioning errors decrease from 0.62-6.87 m to 0.03-0.35 m. Comparative studies further validate the framework's robustness, showing consistent performance gains under non-Gaussian noise and outlier conditions.
MC-Swarm: Minimal-Communication Multi-Agent Trajectory Planning and Deadlock Resolution for Quadrotor Swarm
For effective multi-agent trajectory planning, it is important to consider lightweight communication and its potential asynchrony. This paper presents a distributed trajectory planning algorithm for a quadrotor swarm that operates asynchronously and requires no communication except during the initial planning phase. Moreover, our algorithm guarantees no deadlock under asynchronous updates and absence of communication during flight. To effectively ensure these points, we build two main modules: coordination state updater and trajectory optimizer. The coordination state updater computes waypoints for each agent toward its goal and performs subgoal optimization while considering deadlocks, as well as safety constraints with respect to neighbor agents and obstacles. Then, the trajectory optimizer generates a trajectory that ensures collision avoidance even with the asynchronous planning updates of neighboring agents. We provide a theoretical guarantee of collision avoidance with deadlock resolution and evaluate the effectiveness of our method in complex simulation environments, including random forests and narrow-gap mazes. Additionally, to reduce the total mission time, we design a faster coordination state update using lightweight communication. Lastly, our approach is validated through extensive simulations and real-world experiments with cluttered environment scenarios.
comment: 13 pages, 11 figures
Communication-Efficient Distributed Online Nonconvex Optimization with Time-Varying Constraints
This paper considers distributed online nonconvex optimization with time-varying inequality constraints over a network of agents, where the nonconvex local loss and convex local constraint functions can vary arbitrarily across iterations, and the information of them is privately revealed to each agent at each iteration. For a uniformly jointly strongly connected time-varying directed graph, we propose two distributed bandit online primal--dual algorithm with compressed communication to efficiently utilize communication resources in the one-point and two-point bandit feedback settings, respectively. In nonconvex optimization, finding a globally optimal decision is often NP-hard. As a result, the standard regret metric used in online convex optimization becomes inapplicable. To measure the performance of the proposed algorithms, we use a network regret metric grounded in the first-order optimality condition associated with the variational inequality. We show that the compressed algorithms establish sublinear network regret and cumulative constraint violation bounds. Finally, a simulation example is presented to validate the theoretical results.
comment: 56 pages, 3 figures
Synthesis of safety certificates for discrete-time uncertain systems via convex optimization
We study the problem of co-designing control barrier functions and linear state feedback controllers for discrete-time linear systems affected by additive disturbances. For disturbances of bounded magnitude, we provide a semi-definite program whose feasibility implies the existence of a control law and a certificate ensuring safety in the infinite horizon with respect to the worst-case disturbance realization in the uncertainty set. For disturbances with unbounded support, we rely on martingale theory to derive a second semi-definite program whose feasibility provides probabilistic safety guarantees holding joint-in-time over a finite time horizon. We examine several extensions, including (i) encoding of different types of input constraints, (ii) robustification against distributional ambiguity around the true distribution, (iii) design of safety filters, and (iv) extension to general safety specifications such as obstacle avoidance.
A High-Efficiency Reconfigurable Bidirectional Array Antenna Based on Transmit-Reflect Switchable Metasurface
This paper proposes a reconfigurable bidirectional array antenna with high-efficiency radiations and flexible beam-switching capability by employing a novel transmit-reflect switchable metasurface (TRSM). To realize the electromagnetic (EM) wave transmitted or reflected manipulation, a dedicated transmit-reflect switch layer (TRSL) with periodically soldered PIN diodes is introduced between two transmitted metasurfaces. By switching ON/OFF the embedded diodes, the TRSL performs as a mesh-type ground layer or polarization-grid layer, exhibiting a reflect or transmit property to the incident wave respectively. Further, utilizing the above TRSM configuration in conjunction with a microstrip feed antenna, bidirectional radiations are obtained at the same frequency and polarization. To further reduce the number of PIN diodes and control complexity, an enhanced TRSM using a single diode to control two unit cells is also investigated, resulting in half PIN diodes reduction. Since the bidirectional beam-switching is achieved by only controlling PIN diodes integrated in the ground plane instead of directly acting on the radiation element, which reduces insertion loss and avoids phase quantization errors, the proposed antenna can maintain a high aperture efficiency. To verify this concept, a prototype was designed, fabricated, and measured, demonstrating a successful realization of backward and forward patterns with peak gains of 22.3 and 22.1 dBi, and aperture efficiencies of 47.2% and 43.8%. The 3-dB gain bandwidths of reflected and transmitted modes are 13.7% and 12.3%. This antenna has the advantages of high gain, high aperture efficiency, simple configuration, cost-effectiveness, and flexible and digital beam control.
comment: 11 pages, 18 figures, published to TAP
Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation
This paper presents a modified model predictive control (MPC) framework for real-time power system operation. The framework incorporates a diffusion model tailored for time series generation to enhance the accuracy of the load forecasting module used in the system operation. In the absence of explicit state transition law, a model-identification procedure is leveraged to derive the system dynamics, thereby eliminating a barrier when applying MPC to a renewables-dominated power system. Case study results on an industry park system and the IEEE 30-bus system demonstrate that using the diffusion model to augment the training dataset significantly improves load-forecasting accuracy, and the inferred system dynamics are applicable to the real-time grid operation with solar and wind.
comment: This paper has been accepted by the 2025 IEEE PES General Meeting (PESGM) which will be held in Austin, TX, July.27-31, 2005
Max-Min Fairness in Stacked Intelligent Metasurface-Aided Rate Splitting Networks
This paper investigates a downlink multiuser multiple-input single-output system that integrates rate-splitting multiple access (RSMA) with a stacked intelligent metasurface (SIM) to enable wave-domain beamforming. Unlike conventional digital beamforming, the proposed system leverages the programmable phase shifts of the SIM to perform beamforming entirely in the wave domain. In contrast to existing literature, this work introduces a fairness-centric SIM-RSMA design that shifts the emphasis from maximizing sum-rate to ensuring fair allocation of resources. In particular, we formulate a max-min rate optimization problem that jointly optimizes transmit power coefficients at the base station and SIM phase shifts. Given the non-convex nature of this problem, we develop an alternating optimization framework, where the power allocation is optimized through successive convex approximation and SIM beamforming is optimized using the Riemannian conjugate gradient method. Simulation results indicate that combining SIM with RSMA yields superior max-min performance compared to its integration with space division multiple access or non-orthogonal multiple access.
The Quadrature Gaussian Sum Filter and Smoother for Wiener Systems
Block-Oriented Nonlinear (BONL) models, particularly Wiener models, are widely used for their computational efficiency and practicality in modeling nonlinear behaviors in physical systems. Filtering and smoothing methods for Wiener systems, such as particle filters and Kalman-based techniques, often struggle with computational feasibility or accuracy. This work addresses these challenges by introducing a novel Gaussian Sum Filter for Wiener system state estimation that is built on a Gauss-Legendre quadrature approximation of the likelihood function associated with the output signal. In addition to filtering, a two-filter smoothing strategy is proposed, enabling accurate computation of smoothed state distributions at single and consecutive time instants. Numerical examples demonstrate the superiority of the proposed method in balancing accuracy and computational efficiency compared to traditional approaches, highlighting its benefits in control, state estimation and system identification, for Wiener systems.
comment: This work has been submitted to the IEEE for possible publication. 13 pages, 9 Figures
A Practical Approach to Generating First-Order Rician Channel Statistics in a RC plus CATR Chamber at mmWave
This paper explores a novel hybrid configuration integrating a Reverberation Chamber (RC) with a Compact Antenna Test Range (CATR) to achieve a controllable Rician K-factor. The focus is testing directive antennas in the lower FR2 frequency bands (24.25-29.5 GHz) for 5G and beyond wireless applications. The study meticulously evaluates 39 unique configurations, using a stationary horn antenna for consistent reference K-factor characterization, and considers variables like absorbers and CATR polarization. Results demonstrate that the K-factor can be effectively adjusted within the hybrid setup, maintaining substantial margins above the noise level across all configurations. Sample independence is confirmed for at least 600 samples in all cases. The Bootstrap Anderson-Darling goodness-of-fit test verifies that the data align with Rician or Rayleigh distributions. Analysis of total received power, stirred and unstirred power and frequency-dependent modeling reveals that power variables are inversely related to frequency, while the K-factor remains frequency-independent. The hybrid RC-CATR system achieves a wide range of frequency-averaged K-factors from -9.2 dB to 40.8 dB, with an average granularity of 1.3 dB. Notably, configurations using co-polarized CATR signals yield large K-factors, reduced system losses, and improved frequency stability, underscoring the system's efficacy for millimeter-wave over-the-air testing. This research offers a cost-efficient and repeatable method for generating complex Rician fading channels at mmWave frequencies, crucial for the effective OTA testing of advanced wireless devices.
On the Use of CVRP to Diagnose Faulty Elements in Antenna Arrays
This paper investigates the application of Constrained-View Radiated Power (CVRP) for diagnosing phased array element failures, specifically focusing on on-off element failure. CVRP, similar to Partial Radiated Power (PRP), considers a specific Field-of-View (FoV) but normalizes it by the FoV area. The study explores CVRP's effectiveness in detecting failures in a 2x8 cosine element array under beam-steering conditions, accounting for random and depointing errors, angular resolution, and pattern rotation. Results indicate that CVRP can detect on-off failures based on angular resolution and error severity, under the assumption of reduced Total Radiated Power (TRP) with element failures. Additionally, CVRP is effective with partial far-field patterns, making it suitable for near-field, indirect far-field, and far-field measurement systems without requiring phase acquisition in the latter two.
comment: Presented at EuCAP 2025
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning IROS 2025
Unmanned Aerial Vehicle (UAV) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid-based representations, real-world UAV operations require power-efficient continuous motion planning. We formulate the UAV CPP problem in a continuous environment, minimizing power consumption while ensuring complete coverage. Our approach models the environment with variable-size axis-aligned rectangles and UAV motion with curvature-constrained B\'ezier curves. We train a reinforcement learning agent using an action-mapping-based Soft Actor-Critic (AM-SAC) algorithm employing a self-adaptive curriculum. Experiments on both procedurally generated and hand-crafted scenarios demonstrate the effectiveness of our method in learning energy-efficient coverage strategies.
comment: Submitted to IROS 2025
Distributionally Robust LQG with Kullback-Leibler Ambiguity Sets
The Linear Quadratic Gaussian (LQG) controller is known to be inherently fragile to model misspecifications often occurring in real-world situations. We consider discretetime partially observable stochastic linear systems and provide a robustification of the standard LQG against distributional uncertainties on the process and measurement noise. Our distributionally robust formulation specifies the admissible perturbations by defining a relative entropy based ambiguity set individually for each time step along a finite-horizon trajectory, and minimizes the worst-case cost across all admissible distributions. Notably, we prove that the optimal control policy is still linear, as in standard LQG, and we derive a computational scheme grounded on iterative best response that provably converges to the set of saddle points. Finally, we consider the case of endogenous uncertainty captured via decision-dependent ambiguity sets and we propose an approximation scheme based on dynamic programming.
A spherical amplitude-phase formulation for 3-D adaptive line-of-sight (ALOS) guidance with USGES stability guarantees
A recently proposed 3-D adaptive line-of-sight (ALOS) path-following algorithm addressed coupled motion dynamics of marine craft, aircraft, and uncrewed vehicles under environmental disturbances such as wind, waves, and ocean currents. Stability analysis established uniform semiglobal exponential stability (USGES) of the cross- and vertical-track errors using a body-velocity-based amplitude-phase representation of the North-East-Down (NED) kinematic differential equations. In this brief paper, we revisit the ALOS framework and introduce a novel spherical amplitude-phase representation. This formulation yields a more geometrically intuitive and physically observable description of the guidance errors and enables a significantly simplified stability proof. Unlike the previous model, which relied on a vertical crab angle derived from body-frame velocities, the new representation uses an alternative vertical crab angle and retains the USGES property. It also removes restrictive assumptions such as constant altitude/depth or zero horizontal crab angle, and remains valid for general 3-D maneuvers with nonzero roll, pitch, and flight-path angles.
comment: 8 pages, 1 figure
Fast Contact Detection via Fusion of Joint and Inertial Sensors for Parallel Robots in Human-Robot Collaboration
Fast contact detection is crucial for safe human-robot collaboration. Observers based on proprioceptive information can be used for contact detection but have first-order error dynamics, which results in delays. Sensor fusion based on inertial measurement units (IMUs) consisting of accelerometers and gyroscopes is advantageous for reducing delays. The acceleration estimation enables the direct calculation of external forces. For serial robots, the installation of multiple accelerometers and gyroscopes is required for dynamics modeling since the joint coordinates are the minimal coordinates. Alternatively, parallel robots (PRs) offer the potential to use only one IMU on the end-effector platform, which already presents the minimal coordinates of the PR. This work introduces a sensor-fusion method for contact detection using encoders and only one low-cost, consumer-grade IMU for a PR. The end-effector accelerations are estimated by an extended Kalman filter and incorporated into the dynamics to calculate external forces. In real-world experiments with a planar PR, we demonstrate that this approach reduces the detection duration by up to 50% compared to a momentum observer and enables the collision and clamping detection within 3-39ms.
comment: Preprint of a publication accepted for IEEE Robotics and Automation Letters
Constrained Factor Graph Optimization for Robust Networked Pedestrian Inertial Navigation
This paper presents a novel constrained Factor Graph Optimization (FGO)-based approach for networked inertial navigation in pedestrian localization. To effectively mitigate the drift inherent in inertial navigation solutions, we incorporate kinematic constraints directly into the nonlinear optimization framework. Specifically, we utilize equality constraints, such as Zero-Velocity Updates (ZUPTs), and inequality constraints representing the maximum allowable distance between body-mounted Inertial Measurement Units (IMUs) based on human anatomical limitations. While equality constraints are straightforwardly integrated as error factors, inequality constraints cannot be explicitly represented in standard FGO formulations. To address this, we introduce a differentiable softmax-based penalty term in the FGO cost function to enforce inequality constraints smoothly and robustly. The proposed constrained FGO approach leverages temporal correlations across multiple epochs, resulting in optimal state trajectory estimates while consistently maintaining constraint satisfaction. Experimental results confirm that our method outperforms conventional Kalman filter approaches, demonstrating its effectiveness and robustness for pedestrian navigation.
comment: 6 pages, 5 figures. Accepted by 2025 IEEE/ION Position, Location and Navigation Symposium (PLANS)
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
A Tightly Coupled IMU-Based Motion Capture Approach for Estimating Multibody Kinematics and Kinetics
Inertial Measurement Units (IMUs) enable portable, multibody motion capture (MoCap) in diverse environments beyond the laboratory, making them a practical choice for diagnosing mobility disorders and supporting rehabilitation in clinical or home settings. However, challenges associated with IMU measurements, including magnetic distortions and drift errors, complicate their broader use for MoCap. In this work, we propose a tightly coupled motion capture approach that directly integrates IMU measurements with multibody dynamic models via an Iterated Extended Kalman Filter (IEKF) to simultaneously estimate the system's kinematics and kinetics. By enforcing kinematic and kinetic properties and utilizing only accelerometer and gyroscope data, our method improves IMU-based state estimation accuracy. Our approach is designed to allow for incorporating additional sensor data, such as optical MoCap measurements and joint torque readings, to further enhance estimation accuracy. We validated our approach using highly accurate ground truth data from a 3 Degree of Freedom (DoF) pendulum and a 6 DoF Kuka robot. We demonstrate a maximum Root Mean Square Difference (RMSD) in the pendulum's computed joint angles of 3.75 degrees compared to optical MoCap Inverse Kinematics (IK), which serves as the gold standard in the absence of internal encoders. For the Kuka robot, we observe a maximum joint angle RMSD of 3.24 degrees compared to the Kuka's internal encoders, while the maximum joint angle RMSD of the optical MoCap IK compared to the encoders was 1.16 degrees. Additionally, we report a maximum joint torque RMSD of 2 Nm in the pendulum compared to optical MoCap Inverse Dynamics (ID), and 3.73 Nm in the Kuka robot relative to its internal torque sensors.
Non-Blocking Robustness Analysis in Discrete Event Systems
This paper presents a mathematical framework for characterizing state blocking in discrete event systems (DES) under transition deletions. We introduce a path-based analysis approach that determines whether systems maintain non-blocking properties when transitions are removed. Through formal analysis and case studies, we establish three key contributions: a mathematical characterization of transition-induced blocking with necessary and sufficient conditions, a definition of robust deviations that preserve non-blocking properties, and an algorithm for identifying critical transitions and analyzing system behavior under deletions. Our algorithm reduces computational complexity by leveraging minimal blocking sets, achieving significant reduction in computational requirements. We demonstrate the framework's effectiveness through manufacturing system and autonomous vehicle case studies, showing substantial improvements in identifying critical transitions and predicting potential blocking scenarios across different application domains.
comment: 9 pages, 8 figures, conference paper submitted to IEEE
Fault Detection Method for Power Conversion Circuits Using Thermal Image and Convolutional Autoencoder
A fault detection method for power conversion circuits using thermal images and a convolutional autoencoder is presented. The autoencoder is trained on thermal images captured from a commercial power module at randomly varied load currents and augmented image2 generated through image processing techniques such as resizing, rotation, perspective transformation, and bright and contrast adjustment. Since the autoencoder is trained to output images identical to input only for normal samples, it reconstructs images similar to normal ones even when the input images containing faults. A small heater is attached to the circuit board to simulate a fault on a power module, and then thermal images were captured from different angles and positions, as well as various load currents to test the trained autoencoder model. The areas under the curve (AUC) were obtained to evaluate the proposed method. The results show the autoencoder model can detect anomalies with 100% accuracy under given conditions. The influence of hyperparameters such as the number of convolutional layers and image augmentation conditions on anomaly detection accuracy was also investigated.
Integrating Koopman theory and Lyapunov stability for enhanced model predictive control in nonlinear systems
This paper delves into the challenges posed by the increasing complexity of modern control systems, specifically focusing on bilinear systems, a prevalent subclass of non-linear systems characterized by state dynamics influenced by the interaction of state and control variables. Traditional control strategies, such as PID controllers, often fall short in adequately addressing the intricacies of such systems due to their predictive limitations. To bridge this gap, we introduce Model Predictive Control (MPC), a sophisticated technique that utilizes system models to forecast future behaviors, allowing for the computation of an optimal control sequence by minimizing deviations and control efforts. The Koopman operator emerges as a pivotal tool in this framework by providing a means to linearize the nonlinear dynamics of bilinear systems. By integrating the principles of Lyapunov theory with the linearizing capabilities of the Koopman operator into the MPC framework, we give rise to Koopman Lyapunov-based Model Predictive Control (Koopman LMPC). This approach not only retains MPC's predictive capabilities but also harnesses the Koopman operator's ability to transform complex nonlinear behaviors into a linear framework, thereby enhancing the robustness and applicability of LMPC. With the stability assurances from Lyapunov theory, Koopman LMPC presents a robust solution to effectively control and stabilize bilinear systems. The paper underscores the efficacy of Koopman LMPC, emphasizing its significance in achieving optimal performance and system stability, marking it as a promising approach for the future of advanced control systems.
comment: 19 pages, 6 figures, 2 author photos. Affiliations: Electrical Engineering Department, Pennsylvania State University. Keywords: Lyapunov Stability, Model Predictive Control (MPC), Control Optimization, Koopman Operator, Bilinear Systems, Nonlinear Systems, Closed-loop Design, Control Constraints
Aging-Aware Battery Control via Convex Optimization
We consider the task of controlling a battery while balancing two competing objectives that evolve over different time scales. The short-term objective, such as arbitrage or load smoothing, improves with more battery cycling, while the long-term objective is to maximize battery lifetime, which discourages cycling. Using a semi-empirical aging model, we formulate this problem as a convex optimization problem. We use model predictive control (MPC) with a convex approximation of aging dynamics to optimally manage the trade-off between performance and degradation. Through simulations, we quantify this trade-off in both economic and smoothing applications.
comment: 28 pages, 11 figures
Resource Allocation with Multi-Team Collaboration Based on Hamilton's Rule
This paper presents a multi-team collaboration strategy based on Hamilton's rule from ecology that facilitates resource allocation among multiple teams, where agents are considered as shared resource among all teams that must be allocated appropriately. We construct an algorithmic framework that allows teams to make bids for agents that consider the costs and benefits of transferring agents while also considering relative mission importance for each team. This framework is applied to a multi-team coverage control mission to demonstrate its effectiveness. It is shown that the necessary criteria of a mission evaluation function are met by framing it as a function of the locational coverage cost of each team with respect to agent gain and loss, and these results are illustrated through simulations.
Deep Reinforcement Learning for Power Grid Multi-Stage Cascading Failure Mitigation ICLR 2025
Cascading failures in power grids can lead to grid collapse, causing severe disruptions to social operations and economic activities. In certain cases, multi-stage cascading failures can occur. However, existing cascading-failure-mitigation strategies are usually single-stage-based, overlooking the complexity of the multi-stage scenario. This paper treats the multi-stage cascading failure problem as a reinforcement learning task and develops a simulation environment. The reinforcement learning agent is then trained via the deterministic policy gradient algorithm to achieve continuous actions. Finally, the effectiveness of the proposed approach is validated on the IEEE 14-bus and IEEE 118-bus systems.
comment: This paper has been accepted and presented at ICLR 2025 in Singapore, Apr. 28, 2025
Model-free Online Learning for the Kalman Filter: Forgetting Factor and Logarithmic Regret
We consider the problem of online prediction for an unknown, non-explosive linear stochastic system. With a known system model, the optimal predictor is the celebrated Kalman filter. In the case of unknown systems, existing approaches based on recursive least squares and its variants may suffer from degraded performance due to the highly imbalanced nature of the regression model. This imbalance can easily lead to overfitting and thus degrade prediction accuracy. We tackle this problem by injecting an inductive bias into the regression model via {exponential forgetting}. While exponential forgetting is a common wisdom in online learning, it is typically used for re-weighting data. In contrast, our approach focuses on balancing the regression model. This achieves a better trade-off between {regression} and {regularization errors}, and simultaneously reduces the {accumulation error}. With new proof techniques, we also provide a sharper logarithmic regret bound of $O(\log^3 N)$, where $N$ is the number of observations.
To Stay or to Bypass: Unraveling Mainline Vehicles' Aggregate Strategic Decision-Making at Highway Weaving Ramps
The weaving ramp scenario is a critical bottleneck in highway networks due to conflicting flows and complex interactions among merging, exiting, and through vehicles. In this work, we propose a game-theoretic model to capture and predict the aggregate lane choice behavior of mainline through vehicles as they approach the weaving zone. Faced with potential conflicts from merging and exiting vehicles, mainline vehicles can either bypass the conflict zone by changing to an adjacent lane or stay steadfast in their current lane. Our model effectively captures these strategic choices using a small set of parameters, requiring only limited traffic measurements for calibration. The model's validity is demonstrated through SUMO simulations, achieving high predictive accuracy. The simplicity and flexibility of the proposed framework make it a practical tool for analyzing bottleneck weaving scenarios and informing traffic management strategies.
comment: 8 pages, 2 figures
Multi-step manipulation task and motion planning guided by video demonstration
This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. Towards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later. We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. To demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (I) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa. For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP).
Intelligent Road Anomaly Detection with Real-time Notification System for Enhanced Road Safety
This study aims to improve transportation safety, especially traffic safety. Road damage anomalies such as potholes and cracks have emerged as a significant and recurring cause for accidents. To tackle this problem and improve road safety, a comprehensive system has been developed to detect potholes, cracks (e.g. alligator, transverse, longitudinal), classify their sizes, and transmit this data to the cloud for appropriate action by authorities. The system also broadcasts warning signals to nearby vehicles warning them if a severe anomaly is detected on the road. Moreover, the system can count road anomalies in real-time. It is emulated through the utilization of Raspberry Pi, a camera module, deep learning model, laptop, and cloud service. Deploying this innovative solution aims to proactively enhance road safety by notifying relevant authorities and drivers about the presence of potholes and cracks to take actions, thereby mitigating potential accidents arising from this prevalent road hazard leading to safer road conditions for the whole community.
Data-Driven Input-Output Control Barrier Functions
Control Barrier Functions (CBFs) offer a framework for ensuring set invariance and designing constrained control laws. However, crafting a valid CBF relies on system-specific assumptions and the availability of an accurate system model, underscoring the need for systematic data-driven synthesis methods. This paper introduces a data-driven approach to synthesizing a CBF for discrete-time LTI systems using only input-output measurements. The method begins by computing the maximal control invariant set using an input-output data-driven representation, eliminating the need for precise knowledge of the system's order and explicit state estimation. The proposed CBF is then systematically derived from this set, which can accommodate multiple input-output constraints. Furthermore, the proposed CBF is leveraged to develop a minimally invasive safety filter that ensures recursive feasibility with an adaptive decay rate. To improve clarity, we assume a noise-free dataset, though the method can be extended using data-driven reachability to capture noise effects and handle uncertainty. Finally, the effectiveness of the proposed method is demonstrated on an unknown time-delay system.
Optimization-free Smooth Control Barrier Function for Polygonal Collision Avoidance
Polygonal collision avoidance (PCA) is short for the problem of collision avoidance between two polygons (i.e., polytopes in planar) that own their dynamic equations. This problem suffers the inherent difficulty in dealing with non-smooth boundaries and recently optimization-defined metrics, such as signed distance field (SDF) and its variants, have been proposed as control barrier functions (CBFs) to tackle PCA problems. In contrast, we propose an optimization-free smooth CBF method in this paper, which is computationally efficient and proved to be nonconservative. It is achieved by three main steps: a lower bound of SDF is expressed as a nested Boolean logic composition first, then its smooth approximation is established by applying the latest log-sum-exp method, after which a specified CBF-based safety filter is proposed to address this class of problems. To illustrate its wide applications, the optimization-free smooth CBF method is extended to solve distributed collision avoidance of two underactuated nonholonomic vehicles and drive an underactuated container crane to avoid a moving obstacle respectively, for which numerical simulations are also performed.
Predictive control for nonlinear stochastic systems: Closed-loop guarantees with unbounded noise
We present a stochastic model predictive control framework for nonlinear systems subject to unbounded process noise with closed-loop guarantees. First, we provide a conceptual shrinking-horizon framework that utilizes general probabilistic reachable sets and minimizes the expected cost. Then, we provide a tractable receding-horizon formulation that uses a nominal state to minimize a deterministic quadratic cost and satisfy tightened constraints. Our theoretical analysis demonstrates recursive feasibility, satisfaction of chance constraints, and bounds on the expected cost for the resulting closed-loop system. We provide a constructive design for probabilistic reachable sets of nonlinear continuously differentiable systems using stochastic contraction metrics and an assumed bound on the covariance matrices. Numerical simulations highlight the computational efficiency and theoretical guarantees of the proposed method. Overall, this paper provides a framework for computationally tractable stochastic predictive control with closed-loop guarantees for nonlinear systems with unbounded noise.
comment: Code: https://gitlab.ethz.ch/ics/SMPC-CCM; Added Appendix B to address dynamics nonlinear in noise
A Cooperative Statistical Approach for Abnormal Node Detection with Adversary Resistance
Distinguishing abnormal nodes from those with normal packet loss in clusters helps reduce the loss of clustered network resources. The detection performance of existing detection schemes is limited by the techniques to quantify node behaviors, and most schemes cannot avoid being misled by the falsified information. This paper presents a novel probabilistic abnormal node detection scheme CSD -- Cooperative Statistical Detection -- for accurate and efficient detection in the existence of falsified detection data in clustered networks. Specifically, employing the likelihood ratio test (LRT) based detection method to measure node forwarding behaviors, we propose a modified Z-score based falsification-resistant mechanism to filter out falsifications. We show that both the false alarm and missed detection probabilities can decrease exponentially if and only if the transmissions from the nodes falsifying the data are less than half of the total. Furthermore, the optimal threshold of the modified Z-score method is derived, which guarantees perfect detection of our CSD under any falsification strategy in the proposed detection model. Evaluation results validate the effectiveness, robustness, and superiority of our scheme compared to the state-of-the-art.
comment: 9 pages, 9 figures
Parameter-Varying Feedforward Control: A Kernel-Based Learning Approach
The increasing demands for high accuracy in mechatronic systems necessitate the incorporation of parameter variations in feedforward control. The aim of this paper is to develop a data-driven approach for direct learning of parameter-varying feedforward control to increase tracking performance. The developed approach is based on kernel-regularized function estimation in conjunction with iterative learning to directly learn parameter-varying feedforward control from data. This approach enables high tracking performance for feedforward control of linear parameter-varying dynamics, providing flexibility to varying reference tasks. The developed framework is validated on a benchmark industrial experimental setup featuring a belt-driven carriage.
NExT-LF: A Novel Operational Modal Analysis Method via Tangential Interpolation
Operational Modal Analysis (OMA) is vital for identifying modal parameters under real-world conditions, yet existing methods often face challenges with noise sensitivity and stability. This work introduces NExT-LF, a novel method that combines the well-known Natural Excitation Technique (NExT) with the Loewner Framework (LF). NExT enables the extraction of Impulse Response Functions (IRFs) from output-only vibration data, which are then converted into the frequency domain and used by LF to estimate modal parameters. The proposed method is validated through numerical and experimental case studies. In the numerical study of a 2D Euler-Bernoulli cantilever beam, NExT-LF provides results consistent with analytical solutions and those from standard methods, NExT with Eigensystem Realization Algorithm (NExT-ERA) and Stochastic Subspace Identification with Canonical Variate Analysis (SSI). Additionally, NExT-LF demonstrates superior noise robustness, reliably identifying stable modes across various noise levels where NExT-ERA fails. Experimental validation on the Sheraton Universal Hotel is the first OMA application to this structure, confirming NExT-LF as a robust and efficient method for output-only modal parameter identification.
Signal Temporal Logic Control Synthesis among Uncontrollable Dynamic Agents with Conformal Prediction
The control of dynamical systems under temporal logic specifications among uncontrollable dynamic agents is challenging due to the agents' a-priori unknown behavior. Existing works have considered the problem where either all agents are controllable, the agent models are deterministic and known, or no safety guarantees are provided. We propose a predictive control synthesis framework that guarantees, with high probability, the satisfaction of signal temporal logic (STL) tasks that are defined over a controllable system in the presence of uncontrollable stochastic agents. We use trajectory predictors and conformal prediction to construct probabilistic prediction regions for each uncontrollable agent that are valid over multiple future time steps. Specifically, we construct a normalized prediction region over all agents and time steps to reduce conservatism and increase data efficiency. We then formulate a worst-case bilevel mixed integer program (MIP) that accounts for all agent realizations within the prediction region to obtain an open-loop controller that provably guarantee task satisfaction with high probability. To efficiently solve this bilevel MIP, we propose an equivalent MIP program based on KKT conditions of the original bilevel formulation. Building upon this, we design a closed-loop controller, where both recursive feasibility and task satisfaction can be guaranteed with high probability. We illustrate our control synthesis framework on two case studies.
UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue
Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV-VLRR (Vision-Language-Rapid-Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. 2) A non-linearmodel predictive control (NMPC) with built-in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Video of UAV-VLRR: https://youtu.be/KJqQGKKt1xY
comment: UAV-VLRR
An Overview of the Prospects and Challenges of Using Artificial Intelligence for Energy Management Systems in Microgrids
Microgrids have emerged as a pivotal solution in the quest for a sustainable and energy-efficient future. While microgrids offer numerous advantages, they are also prone to issues related to reliably forecasting renewable energy demand and production, protecting against cyberattacks, controlling operational costs, optimizing power flow, and regulating the performance of energy management systems (EMS). Tackling these energy management challenges is essential to facilitate microgrid applications and seamlessly incorporate renewable energy resources. Artificial intelligence (AI) has recently demonstrated immense potential for optimizing energy management in microgrids, providing efficient and reliable solutions. This paper highlights the combined benefits of enabling AI-based methodologies in the energy management systems of microgrids by examining the applicability and efficiency of AI-based EMS in achieving specific technical and economic objectives. The paper also points out several future research directions that promise to spearhead AI-driven EMS, namely the development of self-healing microgrids, integration with blockchain technology, use of Internet of things (IoT), and addressing interpretability, data privacy, scalability, and the prospects to generative AI in the context of future AI-based EMS.
comment: 62 pages, 6 figures
Systematic interval observer design for linear systems
We first develop systematic and comprehensive interval observer designs for linear time-invariant (LTI) systems, under standard assumptions of observability and interval bounds on the initial condition and uncertainties. Traditionally, such designs rely on specific transformations into Metzler (in continuous time) or non-negative (in discrete time) forms, which may impose limitations. We demonstrate that these can be effectively replaced by an LTI transformation that is straightforward to compute offline. Subsequently, we extend the framework to time-varying systems, overcoming the limitations of conventional approaches that offer no guarantees. Our method utilizes dynamic transformations into higher-dimensional target systems, for which interval observers can always be constructed. These transformations become left-invertible after a finite time, provided the system is observable and the target dynamics are sufficiently high-dimensional and fast, thereby enabling the recovery of interval bounds in the original coordinates. Academic examples are provided to illustrate the proposed methodology.
comment: 7 pages, 3 figures
Systems and Control (EESS)
Load-independent Metrics for Benchmarking Force Controllers
Torque-controlled actuators are critical components in mechatronic systems that closely interact with their environment, such as legged robots, collaborative manipulators, and exoskeletons. The performance and stability of these actuators depend not only on controller design and system dynamics but also significantly on load characteristics, which may include interactions with humans or unstructured environments. This load dependence highlights the need for frameworks that properly assess and compare torque controllers independent of specific loading conditions. In this short paper, we concisely present a modeling approach that captures the impact of load on the closed-loop dynamics of torque-controlled systems. Based on this model, we propose new methods and quantitative metrics, including the Passivity Index Interval, which blends passivity and small-gain theory to offer a less conservative measure of coupled stability than passivity alone. These metrics can be used alongside traditional control performance indicators, such as settling time and bandwidth, to provide a more comprehensive characterization of torque-controlled systems. We demonstrate the application of the proposed metrics through experimental comparisons of linear actuator force controllers.
Optimal Trajectory Planning with Collision Avoidance for Autonomous Vehicle Maneuvering
To perform autonomous driving maneuvers, such as parallel or perpendicular parking, a vehicle requires continual speed and steering adjustments to follow a generated path. In consequence, the path's quality is a limiting factor of the vehicle maneuver's performance. While most path planning approaches include finding a collision-free route, optimal trajectory planning involves solving the best transition from initial to final states, minimizing the action over all paths permitted by a kinematic model. Here we propose a novel method based on sequential convex optimization, which permits flexible and efficient optimal trajectory generation. The objective is to achieve the fastest time, shortest distance, and fewest number of path segments to satisfy motion requirements, while avoiding sensor blind-spots. In our approach, vehicle kinematics are represented by a discretized Dubins model. To avoid collisions, each waypoint is constrained by linear inequalities representing closest distance of obstacles to a polygon specifying the vehicle's extent. To promote smooth and valid trajectories, the solved kinematic state and control variables are constrained and regularized by penalty terms in the model's cost function, which enforces physical restrictions including limits for steering angle, acceleration and speed. In this paper, we analyze trajectories obtained for several parking scenarios. Results demonstrate efficient and collision-free motion generated by the proposed technique.
comment: SIAM Conference on Control and Its Applications, July 28-30th, 2025, Montreal, Canada
Joint Communication Scheduling and Resource Allocation for Distributed Edge Learning: Seamless Integration in Next-Generation Wireless Networks
Distributed edge learning (DL) is considered a cornerstone of intelligence enablers, since it allows for collaborative training without the necessity for local clients to share raw data with other parties, thereby preserving privacy and security. Integrating DL into the 6G networks requires a coexistence design with existing services such as high-bandwidth (HB) traffic like eMBB. Current designs in the literature mainly focus on communication round-wise designs that assume a rigid resource allocation throughout each communication round (CR). However, rigid resource allocation within a CR is a highly inefficient and inaccurate representation of the system's realistic behavior. This is due to the heterogeneous nature of the system, as clients inherently may need to access the network at different times. This work zooms into one arbitrary CR, and demonstrates the importance of considering a time-dependent resource sharing design with HB traffic. We first formulate a time-step-wise optimization problem to minimize the consumed time by DL within the CR while constrained by a DL energy budget. Due to its intractability, a session-based optimization problem is formulated assuming a CR lasts less than a large-scale coherence time. Some scheduling properties of such multi-server joint communication scheduling and resource allocation framework have been established. An iterative algorithm has been designed to solve such non-convex and non-block-separable-constrained problems. Simulation results confirm the importance of the efficient and accurate integration design proposed in this work.
comment: This work has been submitted to the IEEE for possible publication
Robust Indoor Localization via Conformal Methods and Variational Bayesian Adaptive Filtering
Indoor localization is critical for IoT applications, yet challenges such as non-Gaussian noise, environmental interference, and measurement outliers hinder the robustness of traditional methods. Existing approaches, including Kalman filtering and its variants, often rely on Gaussian assumptions or static thresholds, limiting adaptability in dynamic environments. This paper proposes a hierarchical robust framework integrating Variational Bayesian (VB) parameter learning, Huber M-estimation, and Conformal Outlier Detection (COD) to address these limitations. First, VB inference jointly estimates state and noise parameters, adapting to time-varying uncertainties. Second, Huber-based robust filtering suppresses mild outliers while preserving Gaussian efficiency. Third, COD provides statistical guarantees for outlier detection via dynamically calibrated thresholds, ensuring a user-controlled false alarm rate. Theoretically, we prove the Semi-positive Definiteness of Huber-based Kalman filtering covariance and the coverage of sliding window conformal prediction. Experiments on geomagnetic fingerprint datasets demonstrate significant improvements: fingerprint matching accuracy increases from 81.25% to 93.75%, and positioning errors decrease from 0.62-6.87 m to 0.03-0.35 m. Comparative studies further validate the framework's robustness, showing consistent performance gains under non-Gaussian noise and outlier conditions.
MC-Swarm: Minimal-Communication Multi-Agent Trajectory Planning and Deadlock Resolution for Quadrotor Swarm
For effective multi-agent trajectory planning, it is important to consider lightweight communication and its potential asynchrony. This paper presents a distributed trajectory planning algorithm for a quadrotor swarm that operates asynchronously and requires no communication except during the initial planning phase. Moreover, our algorithm guarantees no deadlock under asynchronous updates and absence of communication during flight. To effectively ensure these points, we build two main modules: coordination state updater and trajectory optimizer. The coordination state updater computes waypoints for each agent toward its goal and performs subgoal optimization while considering deadlocks, as well as safety constraints with respect to neighbor agents and obstacles. Then, the trajectory optimizer generates a trajectory that ensures collision avoidance even with the asynchronous planning updates of neighboring agents. We provide a theoretical guarantee of collision avoidance with deadlock resolution and evaluate the effectiveness of our method in complex simulation environments, including random forests and narrow-gap mazes. Additionally, to reduce the total mission time, we design a faster coordination state update using lightweight communication. Lastly, our approach is validated through extensive simulations and real-world experiments with cluttered environment scenarios.
comment: 13 pages, 11 figures
Communication-Efficient Distributed Online Nonconvex Optimization with Time-Varying Constraints
This paper considers distributed online nonconvex optimization with time-varying inequality constraints over a network of agents, where the nonconvex local loss and convex local constraint functions can vary arbitrarily across iterations, and the information of them is privately revealed to each agent at each iteration. For a uniformly jointly strongly connected time-varying directed graph, we propose two distributed bandit online primal--dual algorithm with compressed communication to efficiently utilize communication resources in the one-point and two-point bandit feedback settings, respectively. In nonconvex optimization, finding a globally optimal decision is often NP-hard. As a result, the standard regret metric used in online convex optimization becomes inapplicable. To measure the performance of the proposed algorithms, we use a network regret metric grounded in the first-order optimality condition associated with the variational inequality. We show that the compressed algorithms establish sublinear network regret and cumulative constraint violation bounds. Finally, a simulation example is presented to validate the theoretical results.
comment: 56 pages, 3 figures
Synthesis of safety certificates for discrete-time uncertain systems via convex optimization
We study the problem of co-designing control barrier functions and linear state feedback controllers for discrete-time linear systems affected by additive disturbances. For disturbances of bounded magnitude, we provide a semi-definite program whose feasibility implies the existence of a control law and a certificate ensuring safety in the infinite horizon with respect to the worst-case disturbance realization in the uncertainty set. For disturbances with unbounded support, we rely on martingale theory to derive a second semi-definite program whose feasibility provides probabilistic safety guarantees holding joint-in-time over a finite time horizon. We examine several extensions, including (i) encoding of different types of input constraints, (ii) robustification against distributional ambiguity around the true distribution, (iii) design of safety filters, and (iv) extension to general safety specifications such as obstacle avoidance.
A High-Efficiency Reconfigurable Bidirectional Array Antenna Based on Transmit-Reflect Switchable Metasurface
This paper proposes a reconfigurable bidirectional array antenna with high-efficiency radiations and flexible beam-switching capability by employing a novel transmit-reflect switchable metasurface (TRSM). To realize the electromagnetic (EM) wave transmitted or reflected manipulation, a dedicated transmit-reflect switch layer (TRSL) with periodically soldered PIN diodes is introduced between two transmitted metasurfaces. By switching ON/OFF the embedded diodes, the TRSL performs as a mesh-type ground layer or polarization-grid layer, exhibiting a reflect or transmit property to the incident wave respectively. Further, utilizing the above TRSM configuration in conjunction with a microstrip feed antenna, bidirectional radiations are obtained at the same frequency and polarization. To further reduce the number of PIN diodes and control complexity, an enhanced TRSM using a single diode to control two unit cells is also investigated, resulting in half PIN diodes reduction. Since the bidirectional beam-switching is achieved by only controlling PIN diodes integrated in the ground plane instead of directly acting on the radiation element, which reduces insertion loss and avoids phase quantization errors, the proposed antenna can maintain a high aperture efficiency. To verify this concept, a prototype was designed, fabricated, and measured, demonstrating a successful realization of backward and forward patterns with peak gains of 22.3 and 22.1 dBi, and aperture efficiencies of 47.2% and 43.8%. The 3-dB gain bandwidths of reflected and transmitted modes are 13.7% and 12.3%. This antenna has the advantages of high gain, high aperture efficiency, simple configuration, cost-effectiveness, and flexible and digital beam control.
comment: 11 pages, 18 figures, published to TAP
Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation
This paper presents a modified model predictive control (MPC) framework for real-time power system operation. The framework incorporates a diffusion model tailored for time series generation to enhance the accuracy of the load forecasting module used in the system operation. In the absence of explicit state transition law, a model-identification procedure is leveraged to derive the system dynamics, thereby eliminating a barrier when applying MPC to a renewables-dominated power system. Case study results on an industry park system and the IEEE 30-bus system demonstrate that using the diffusion model to augment the training dataset significantly improves load-forecasting accuracy, and the inferred system dynamics are applicable to the real-time grid operation with solar and wind.
comment: This paper has been accepted by the 2025 IEEE PES General Meeting (PESGM) which will be held in Austin, TX, July.27-31, 2005
Max-Min Fairness in Stacked Intelligent Metasurface-Aided Rate Splitting Networks
This paper investigates a downlink multiuser multiple-input single-output system that integrates rate-splitting multiple access (RSMA) with a stacked intelligent metasurface (SIM) to enable wave-domain beamforming. Unlike conventional digital beamforming, the proposed system leverages the programmable phase shifts of the SIM to perform beamforming entirely in the wave domain. In contrast to existing literature, this work introduces a fairness-centric SIM-RSMA design that shifts the emphasis from maximizing sum-rate to ensuring fair allocation of resources. In particular, we formulate a max-min rate optimization problem that jointly optimizes transmit power coefficients at the base station and SIM phase shifts. Given the non-convex nature of this problem, we develop an alternating optimization framework, where the power allocation is optimized through successive convex approximation and SIM beamforming is optimized using the Riemannian conjugate gradient method. Simulation results indicate that combining SIM with RSMA yields superior max-min performance compared to its integration with space division multiple access or non-orthogonal multiple access.
The Quadrature Gaussian Sum Filter and Smoother for Wiener Systems
Block-Oriented Nonlinear (BONL) models, particularly Wiener models, are widely used for their computational efficiency and practicality in modeling nonlinear behaviors in physical systems. Filtering and smoothing methods for Wiener systems, such as particle filters and Kalman-based techniques, often struggle with computational feasibility or accuracy. This work addresses these challenges by introducing a novel Gaussian Sum Filter for Wiener system state estimation that is built on a Gauss-Legendre quadrature approximation of the likelihood function associated with the output signal. In addition to filtering, a two-filter smoothing strategy is proposed, enabling accurate computation of smoothed state distributions at single and consecutive time instants. Numerical examples demonstrate the superiority of the proposed method in balancing accuracy and computational efficiency compared to traditional approaches, highlighting its benefits in control, state estimation and system identification, for Wiener systems.
comment: This work has been submitted to the IEEE for possible publication. 13 pages, 9 Figures
A Practical Approach to Generating First-Order Rician Channel Statistics in a RC plus CATR Chamber at mmWave
This paper explores a novel hybrid configuration integrating a Reverberation Chamber (RC) with a Compact Antenna Test Range (CATR) to achieve a controllable Rician K-factor. The focus is testing directive antennas in the lower FR2 frequency bands (24.25-29.5 GHz) for 5G and beyond wireless applications. The study meticulously evaluates 39 unique configurations, using a stationary horn antenna for consistent reference K-factor characterization, and considers variables like absorbers and CATR polarization. Results demonstrate that the K-factor can be effectively adjusted within the hybrid setup, maintaining substantial margins above the noise level across all configurations. Sample independence is confirmed for at least 600 samples in all cases. The Bootstrap Anderson-Darling goodness-of-fit test verifies that the data align with Rician or Rayleigh distributions. Analysis of total received power, stirred and unstirred power and frequency-dependent modeling reveals that power variables are inversely related to frequency, while the K-factor remains frequency-independent. The hybrid RC-CATR system achieves a wide range of frequency-averaged K-factors from -9.2 dB to 40.8 dB, with an average granularity of 1.3 dB. Notably, configurations using co-polarized CATR signals yield large K-factors, reduced system losses, and improved frequency stability, underscoring the system's efficacy for millimeter-wave over-the-air testing. This research offers a cost-efficient and repeatable method for generating complex Rician fading channels at mmWave frequencies, crucial for the effective OTA testing of advanced wireless devices.
On the Use of CVRP to Diagnose Faulty Elements in Antenna Arrays
This paper investigates the application of Constrained-View Radiated Power (CVRP) for diagnosing phased array element failures, specifically focusing on on-off element failure. CVRP, similar to Partial Radiated Power (PRP), considers a specific Field-of-View (FoV) but normalizes it by the FoV area. The study explores CVRP's effectiveness in detecting failures in a 2x8 cosine element array under beam-steering conditions, accounting for random and depointing errors, angular resolution, and pattern rotation. Results indicate that CVRP can detect on-off failures based on angular resolution and error severity, under the assumption of reduced Total Radiated Power (TRP) with element failures. Additionally, CVRP is effective with partial far-field patterns, making it suitable for near-field, indirect far-field, and far-field measurement systems without requiring phase acquisition in the latter two.
comment: Presented at EuCAP 2025
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning IROS 2025
Unmanned Aerial Vehicle (UAV) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid-based representations, real-world UAV operations require power-efficient continuous motion planning. We formulate the UAV CPP problem in a continuous environment, minimizing power consumption while ensuring complete coverage. Our approach models the environment with variable-size axis-aligned rectangles and UAV motion with curvature-constrained B\'ezier curves. We train a reinforcement learning agent using an action-mapping-based Soft Actor-Critic (AM-SAC) algorithm employing a self-adaptive curriculum. Experiments on both procedurally generated and hand-crafted scenarios demonstrate the effectiveness of our method in learning energy-efficient coverage strategies.
comment: Submitted to IROS 2025
Distributionally Robust LQG with Kullback-Leibler Ambiguity Sets
The Linear Quadratic Gaussian (LQG) controller is known to be inherently fragile to model misspecifications often occurring in real-world situations. We consider discretetime partially observable stochastic linear systems and provide a robustification of the standard LQG against distributional uncertainties on the process and measurement noise. Our distributionally robust formulation specifies the admissible perturbations by defining a relative entropy based ambiguity set individually for each time step along a finite-horizon trajectory, and minimizes the worst-case cost across all admissible distributions. Notably, we prove that the optimal control policy is still linear, as in standard LQG, and we derive a computational scheme grounded on iterative best response that provably converges to the set of saddle points. Finally, we consider the case of endogenous uncertainty captured via decision-dependent ambiguity sets and we propose an approximation scheme based on dynamic programming.
A spherical amplitude-phase formulation for 3-D adaptive line-of-sight (ALOS) guidance with USGES stability guarantees
A recently proposed 3-D adaptive line-of-sight (ALOS) path-following algorithm addressed coupled motion dynamics of marine craft, aircraft, and uncrewed vehicles under environmental disturbances such as wind, waves, and ocean currents. Stability analysis established uniform semiglobal exponential stability (USGES) of the cross- and vertical-track errors using a body-velocity-based amplitude-phase representation of the North-East-Down (NED) kinematic differential equations. In this brief paper, we revisit the ALOS framework and introduce a novel spherical amplitude-phase representation. This formulation yields a more geometrically intuitive and physically observable description of the guidance errors and enables a significantly simplified stability proof. Unlike the previous model, which relied on a vertical crab angle derived from body-frame velocities, the new representation uses an alternative vertical crab angle and retains the USGES property. It also removes restrictive assumptions such as constant altitude/depth or zero horizontal crab angle, and remains valid for general 3-D maneuvers with nonzero roll, pitch, and flight-path angles.
comment: 8 pages, 1 figure
Fast Contact Detection via Fusion of Joint and Inertial Sensors for Parallel Robots in Human-Robot Collaboration
Fast contact detection is crucial for safe human-robot collaboration. Observers based on proprioceptive information can be used for contact detection but have first-order error dynamics, which results in delays. Sensor fusion based on inertial measurement units (IMUs) consisting of accelerometers and gyroscopes is advantageous for reducing delays. The acceleration estimation enables the direct calculation of external forces. For serial robots, the installation of multiple accelerometers and gyroscopes is required for dynamics modeling since the joint coordinates are the minimal coordinates. Alternatively, parallel robots (PRs) offer the potential to use only one IMU on the end-effector platform, which already presents the minimal coordinates of the PR. This work introduces a sensor-fusion method for contact detection using encoders and only one low-cost, consumer-grade IMU for a PR. The end-effector accelerations are estimated by an extended Kalman filter and incorporated into the dynamics to calculate external forces. In real-world experiments with a planar PR, we demonstrate that this approach reduces the detection duration by up to 50% compared to a momentum observer and enables the collision and clamping detection within 3-39ms.
comment: Preprint of a publication accepted for IEEE Robotics and Automation Letters
Constrained Factor Graph Optimization for Robust Networked Pedestrian Inertial Navigation
This paper presents a novel constrained Factor Graph Optimization (FGO)-based approach for networked inertial navigation in pedestrian localization. To effectively mitigate the drift inherent in inertial navigation solutions, we incorporate kinematic constraints directly into the nonlinear optimization framework. Specifically, we utilize equality constraints, such as Zero-Velocity Updates (ZUPTs), and inequality constraints representing the maximum allowable distance between body-mounted Inertial Measurement Units (IMUs) based on human anatomical limitations. While equality constraints are straightforwardly integrated as error factors, inequality constraints cannot be explicitly represented in standard FGO formulations. To address this, we introduce a differentiable softmax-based penalty term in the FGO cost function to enforce inequality constraints smoothly and robustly. The proposed constrained FGO approach leverages temporal correlations across multiple epochs, resulting in optimal state trajectory estimates while consistently maintaining constraint satisfaction. Experimental results confirm that our method outperforms conventional Kalman filter approaches, demonstrating its effectiveness and robustness for pedestrian navigation.
comment: 6 pages, 5 figures. Accepted by 2025 IEEE/ION Position, Location and Navigation Symposium (PLANS)
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
A Tightly Coupled IMU-Based Motion Capture Approach for Estimating Multibody Kinematics and Kinetics
Inertial Measurement Units (IMUs) enable portable, multibody motion capture (MoCap) in diverse environments beyond the laboratory, making them a practical choice for diagnosing mobility disorders and supporting rehabilitation in clinical or home settings. However, challenges associated with IMU measurements, including magnetic distortions and drift errors, complicate their broader use for MoCap. In this work, we propose a tightly coupled motion capture approach that directly integrates IMU measurements with multibody dynamic models via an Iterated Extended Kalman Filter (IEKF) to simultaneously estimate the system's kinematics and kinetics. By enforcing kinematic and kinetic properties and utilizing only accelerometer and gyroscope data, our method improves IMU-based state estimation accuracy. Our approach is designed to allow for incorporating additional sensor data, such as optical MoCap measurements and joint torque readings, to further enhance estimation accuracy. We validated our approach using highly accurate ground truth data from a 3 Degree of Freedom (DoF) pendulum and a 6 DoF Kuka robot. We demonstrate a maximum Root Mean Square Difference (RMSD) in the pendulum's computed joint angles of 3.75 degrees compared to optical MoCap Inverse Kinematics (IK), which serves as the gold standard in the absence of internal encoders. For the Kuka robot, we observe a maximum joint angle RMSD of 3.24 degrees compared to the Kuka's internal encoders, while the maximum joint angle RMSD of the optical MoCap IK compared to the encoders was 1.16 degrees. Additionally, we report a maximum joint torque RMSD of 2 Nm in the pendulum compared to optical MoCap Inverse Dynamics (ID), and 3.73 Nm in the Kuka robot relative to its internal torque sensors.
Non-Blocking Robustness Analysis in Discrete Event Systems
This paper presents a mathematical framework for characterizing state blocking in discrete event systems (DES) under transition deletions. We introduce a path-based analysis approach that determines whether systems maintain non-blocking properties when transitions are removed. Through formal analysis and case studies, we establish three key contributions: a mathematical characterization of transition-induced blocking with necessary and sufficient conditions, a definition of robust deviations that preserve non-blocking properties, and an algorithm for identifying critical transitions and analyzing system behavior under deletions. Our algorithm reduces computational complexity by leveraging minimal blocking sets, achieving significant reduction in computational requirements. We demonstrate the framework's effectiveness through manufacturing system and autonomous vehicle case studies, showing substantial improvements in identifying critical transitions and predicting potential blocking scenarios across different application domains.
comment: 9 pages, 8 figures, conference paper submitted to IEEE
Fault Detection Method for Power Conversion Circuits Using Thermal Image and Convolutional Autoencoder
A fault detection method for power conversion circuits using thermal images and a convolutional autoencoder is presented. The autoencoder is trained on thermal images captured from a commercial power module at randomly varied load currents and augmented image2 generated through image processing techniques such as resizing, rotation, perspective transformation, and bright and contrast adjustment. Since the autoencoder is trained to output images identical to input only for normal samples, it reconstructs images similar to normal ones even when the input images containing faults. A small heater is attached to the circuit board to simulate a fault on a power module, and then thermal images were captured from different angles and positions, as well as various load currents to test the trained autoencoder model. The areas under the curve (AUC) were obtained to evaluate the proposed method. The results show the autoencoder model can detect anomalies with 100% accuracy under given conditions. The influence of hyperparameters such as the number of convolutional layers and image augmentation conditions on anomaly detection accuracy was also investigated.
Integrating Koopman theory and Lyapunov stability for enhanced model predictive control in nonlinear systems
This paper delves into the challenges posed by the increasing complexity of modern control systems, specifically focusing on bilinear systems, a prevalent subclass of non-linear systems characterized by state dynamics influenced by the interaction of state and control variables. Traditional control strategies, such as PID controllers, often fall short in adequately addressing the intricacies of such systems due to their predictive limitations. To bridge this gap, we introduce Model Predictive Control (MPC), a sophisticated technique that utilizes system models to forecast future behaviors, allowing for the computation of an optimal control sequence by minimizing deviations and control efforts. The Koopman operator emerges as a pivotal tool in this framework by providing a means to linearize the nonlinear dynamics of bilinear systems. By integrating the principles of Lyapunov theory with the linearizing capabilities of the Koopman operator into the MPC framework, we give rise to Koopman Lyapunov-based Model Predictive Control (Koopman LMPC). This approach not only retains MPC's predictive capabilities but also harnesses the Koopman operator's ability to transform complex nonlinear behaviors into a linear framework, thereby enhancing the robustness and applicability of LMPC. With the stability assurances from Lyapunov theory, Koopman LMPC presents a robust solution to effectively control and stabilize bilinear systems. The paper underscores the efficacy of Koopman LMPC, emphasizing its significance in achieving optimal performance and system stability, marking it as a promising approach for the future of advanced control systems.
comment: 19 pages, 6 figures, 2 author photos. Affiliations: Electrical Engineering Department, Pennsylvania State University. Keywords: Lyapunov Stability, Model Predictive Control (MPC), Control Optimization, Koopman Operator, Bilinear Systems, Nonlinear Systems, Closed-loop Design, Control Constraints
Aging-Aware Battery Control via Convex Optimization
We consider the task of controlling a battery while balancing two competing objectives that evolve over different time scales. The short-term objective, such as arbitrage or load smoothing, improves with more battery cycling, while the long-term objective is to maximize battery lifetime, which discourages cycling. Using a semi-empirical aging model, we formulate this problem as a convex optimization problem. We use model predictive control (MPC) with a convex approximation of aging dynamics to optimally manage the trade-off between performance and degradation. Through simulations, we quantify this trade-off in both economic and smoothing applications.
comment: 28 pages, 11 figures
Resource Allocation with Multi-Team Collaboration Based on Hamilton's Rule
This paper presents a multi-team collaboration strategy based on Hamilton's rule from ecology that facilitates resource allocation among multiple teams, where agents are considered as shared resource among all teams that must be allocated appropriately. We construct an algorithmic framework that allows teams to make bids for agents that consider the costs and benefits of transferring agents while also considering relative mission importance for each team. This framework is applied to a multi-team coverage control mission to demonstrate its effectiveness. It is shown that the necessary criteria of a mission evaluation function are met by framing it as a function of the locational coverage cost of each team with respect to agent gain and loss, and these results are illustrated through simulations.
Deep Reinforcement Learning for Power Grid Multi-Stage Cascading Failure Mitigation ICLR 2025
Cascading failures in power grids can lead to grid collapse, causing severe disruptions to social operations and economic activities. In certain cases, multi-stage cascading failures can occur. However, existing cascading-failure-mitigation strategies are usually single-stage-based, overlooking the complexity of the multi-stage scenario. This paper treats the multi-stage cascading failure problem as a reinforcement learning task and develops a simulation environment. The reinforcement learning agent is then trained via the deterministic policy gradient algorithm to achieve continuous actions. Finally, the effectiveness of the proposed approach is validated on the IEEE 14-bus and IEEE 118-bus systems.
comment: This paper has been accepted and presented at ICLR 2025 in Singapore, Apr. 28, 2025
Model-free Online Learning for the Kalman Filter: Forgetting Factor and Logarithmic Regret
We consider the problem of online prediction for an unknown, non-explosive linear stochastic system. With a known system model, the optimal predictor is the celebrated Kalman filter. In the case of unknown systems, existing approaches based on recursive least squares and its variants may suffer from degraded performance due to the highly imbalanced nature of the regression model. This imbalance can easily lead to overfitting and thus degrade prediction accuracy. We tackle this problem by injecting an inductive bias into the regression model via {exponential forgetting}. While exponential forgetting is a common wisdom in online learning, it is typically used for re-weighting data. In contrast, our approach focuses on balancing the regression model. This achieves a better trade-off between {regression} and {regularization errors}, and simultaneously reduces the {accumulation error}. With new proof techniques, we also provide a sharper logarithmic regret bound of $O(\log^3 N)$, where $N$ is the number of observations.
To Stay or to Bypass: Unraveling Mainline Vehicles' Aggregate Strategic Decision-Making at Highway Weaving Ramps
The weaving ramp scenario is a critical bottleneck in highway networks due to conflicting flows and complex interactions among merging, exiting, and through vehicles. In this work, we propose a game-theoretic model to capture and predict the aggregate lane choice behavior of mainline through vehicles as they approach the weaving zone. Faced with potential conflicts from merging and exiting vehicles, mainline vehicles can either bypass the conflict zone by changing to an adjacent lane or stay steadfast in their current lane. Our model effectively captures these strategic choices using a small set of parameters, requiring only limited traffic measurements for calibration. The model's validity is demonstrated through SUMO simulations, achieving high predictive accuracy. The simplicity and flexibility of the proposed framework make it a practical tool for analyzing bottleneck weaving scenarios and informing traffic management strategies.
comment: 8 pages, 2 figures
Multi-step manipulation task and motion planning guided by video demonstration
This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. Towards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later. We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. To demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (I) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa. For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP).
Intelligent Road Anomaly Detection with Real-time Notification System for Enhanced Road Safety
This study aims to improve transportation safety, especially traffic safety. Road damage anomalies such as potholes and cracks have emerged as a significant and recurring cause for accidents. To tackle this problem and improve road safety, a comprehensive system has been developed to detect potholes, cracks (e.g. alligator, transverse, longitudinal), classify their sizes, and transmit this data to the cloud for appropriate action by authorities. The system also broadcasts warning signals to nearby vehicles warning them if a severe anomaly is detected on the road. Moreover, the system can count road anomalies in real-time. It is emulated through the utilization of Raspberry Pi, a camera module, deep learning model, laptop, and cloud service. Deploying this innovative solution aims to proactively enhance road safety by notifying relevant authorities and drivers about the presence of potholes and cracks to take actions, thereby mitigating potential accidents arising from this prevalent road hazard leading to safer road conditions for the whole community.
Data-Driven Input-Output Control Barrier Functions
Control Barrier Functions (CBFs) offer a framework for ensuring set invariance and designing constrained control laws. However, crafting a valid CBF relies on system-specific assumptions and the availability of an accurate system model, underscoring the need for systematic data-driven synthesis methods. This paper introduces a data-driven approach to synthesizing a CBF for discrete-time LTI systems using only input-output measurements. The method begins by computing the maximal control invariant set using an input-output data-driven representation, eliminating the need for precise knowledge of the system's order and explicit state estimation. The proposed CBF is then systematically derived from this set, which can accommodate multiple input-output constraints. Furthermore, the proposed CBF is leveraged to develop a minimally invasive safety filter that ensures recursive feasibility with an adaptive decay rate. To improve clarity, we assume a noise-free dataset, though the method can be extended using data-driven reachability to capture noise effects and handle uncertainty. Finally, the effectiveness of the proposed method is demonstrated on an unknown time-delay system.
Optimization-free Smooth Control Barrier Function for Polygonal Collision Avoidance
Polygonal collision avoidance (PCA) is short for the problem of collision avoidance between two polygons (i.e., polytopes in planar) that own their dynamic equations. This problem suffers the inherent difficulty in dealing with non-smooth boundaries and recently optimization-defined metrics, such as signed distance field (SDF) and its variants, have been proposed as control barrier functions (CBFs) to tackle PCA problems. In contrast, we propose an optimization-free smooth CBF method in this paper, which is computationally efficient and proved to be nonconservative. It is achieved by three main steps: a lower bound of SDF is expressed as a nested Boolean logic composition first, then its smooth approximation is established by applying the latest log-sum-exp method, after which a specified CBF-based safety filter is proposed to address this class of problems. To illustrate its wide applications, the optimization-free smooth CBF method is extended to solve distributed collision avoidance of two underactuated nonholonomic vehicles and drive an underactuated container crane to avoid a moving obstacle respectively, for which numerical simulations are also performed.
Predictive control for nonlinear stochastic systems: Closed-loop guarantees with unbounded noise
We present a stochastic model predictive control framework for nonlinear systems subject to unbounded process noise with closed-loop guarantees. First, we provide a conceptual shrinking-horizon framework that utilizes general probabilistic reachable sets and minimizes the expected cost. Then, we provide a tractable receding-horizon formulation that uses a nominal state to minimize a deterministic quadratic cost and satisfy tightened constraints. Our theoretical analysis demonstrates recursive feasibility, satisfaction of chance constraints, and bounds on the expected cost for the resulting closed-loop system. We provide a constructive design for probabilistic reachable sets of nonlinear continuously differentiable systems using stochastic contraction metrics and an assumed bound on the covariance matrices. Numerical simulations highlight the computational efficiency and theoretical guarantees of the proposed method. Overall, this paper provides a framework for computationally tractable stochastic predictive control with closed-loop guarantees for nonlinear systems with unbounded noise.
comment: Code: https://gitlab.ethz.ch/ics/SMPC-CCM; Added Appendix B to address dynamics nonlinear in noise
A Cooperative Statistical Approach for Abnormal Node Detection with Adversary Resistance
Distinguishing abnormal nodes from those with normal packet loss in clusters helps reduce the loss of clustered network resources. The detection performance of existing detection schemes is limited by the techniques to quantify node behaviors, and most schemes cannot avoid being misled by the falsified information. This paper presents a novel probabilistic abnormal node detection scheme CSD -- Cooperative Statistical Detection -- for accurate and efficient detection in the existence of falsified detection data in clustered networks. Specifically, employing the likelihood ratio test (LRT) based detection method to measure node forwarding behaviors, we propose a modified Z-score based falsification-resistant mechanism to filter out falsifications. We show that both the false alarm and missed detection probabilities can decrease exponentially if and only if the transmissions from the nodes falsifying the data are less than half of the total. Furthermore, the optimal threshold of the modified Z-score method is derived, which guarantees perfect detection of our CSD under any falsification strategy in the proposed detection model. Evaluation results validate the effectiveness, robustness, and superiority of our scheme compared to the state-of-the-art.
comment: 9 pages, 9 figures
Parameter-Varying Feedforward Control: A Kernel-Based Learning Approach
The increasing demands for high accuracy in mechatronic systems necessitate the incorporation of parameter variations in feedforward control. The aim of this paper is to develop a data-driven approach for direct learning of parameter-varying feedforward control to increase tracking performance. The developed approach is based on kernel-regularized function estimation in conjunction with iterative learning to directly learn parameter-varying feedforward control from data. This approach enables high tracking performance for feedforward control of linear parameter-varying dynamics, providing flexibility to varying reference tasks. The developed framework is validated on a benchmark industrial experimental setup featuring a belt-driven carriage.
NExT-LF: A Novel Operational Modal Analysis Method via Tangential Interpolation
Operational Modal Analysis (OMA) is vital for identifying modal parameters under real-world conditions, yet existing methods often face challenges with noise sensitivity and stability. This work introduces NExT-LF, a novel method that combines the well-known Natural Excitation Technique (NExT) with the Loewner Framework (LF). NExT enables the extraction of Impulse Response Functions (IRFs) from output-only vibration data, which are then converted into the frequency domain and used by LF to estimate modal parameters. The proposed method is validated through numerical and experimental case studies. In the numerical study of a 2D Euler-Bernoulli cantilever beam, NExT-LF provides results consistent with analytical solutions and those from standard methods, NExT with Eigensystem Realization Algorithm (NExT-ERA) and Stochastic Subspace Identification with Canonical Variate Analysis (SSI). Additionally, NExT-LF demonstrates superior noise robustness, reliably identifying stable modes across various noise levels where NExT-ERA fails. Experimental validation on the Sheraton Universal Hotel is the first OMA application to this structure, confirming NExT-LF as a robust and efficient method for output-only modal parameter identification.
Signal Temporal Logic Control Synthesis among Uncontrollable Dynamic Agents with Conformal Prediction
The control of dynamical systems under temporal logic specifications among uncontrollable dynamic agents is challenging due to the agents' a-priori unknown behavior. Existing works have considered the problem where either all agents are controllable, the agent models are deterministic and known, or no safety guarantees are provided. We propose a predictive control synthesis framework that guarantees, with high probability, the satisfaction of signal temporal logic (STL) tasks that are defined over a controllable system in the presence of uncontrollable stochastic agents. We use trajectory predictors and conformal prediction to construct probabilistic prediction regions for each uncontrollable agent that are valid over multiple future time steps. Specifically, we construct a normalized prediction region over all agents and time steps to reduce conservatism and increase data efficiency. We then formulate a worst-case bilevel mixed integer program (MIP) that accounts for all agent realizations within the prediction region to obtain an open-loop controller that provably guarantee task satisfaction with high probability. To efficiently solve this bilevel MIP, we propose an equivalent MIP program based on KKT conditions of the original bilevel formulation. Building upon this, we design a closed-loop controller, where both recursive feasibility and task satisfaction can be guaranteed with high probability. We illustrate our control synthesis framework on two case studies.
UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue
Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV-VLRR (Vision-Language-Rapid-Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. 2) A non-linearmodel predictive control (NMPC) with built-in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Video of UAV-VLRR: https://youtu.be/KJqQGKKt1xY
comment: UAV-VLRR
An Overview of the Prospects and Challenges of Using Artificial Intelligence for Energy Management Systems in Microgrids
Microgrids have emerged as a pivotal solution in the quest for a sustainable and energy-efficient future. While microgrids offer numerous advantages, they are also prone to issues related to reliably forecasting renewable energy demand and production, protecting against cyberattacks, controlling operational costs, optimizing power flow, and regulating the performance of energy management systems (EMS). Tackling these energy management challenges is essential to facilitate microgrid applications and seamlessly incorporate renewable energy resources. Artificial intelligence (AI) has recently demonstrated immense potential for optimizing energy management in microgrids, providing efficient and reliable solutions. This paper highlights the combined benefits of enabling AI-based methodologies in the energy management systems of microgrids by examining the applicability and efficiency of AI-based EMS in achieving specific technical and economic objectives. The paper also points out several future research directions that promise to spearhead AI-driven EMS, namely the development of self-healing microgrids, integration with blockchain technology, use of Internet of things (IoT), and addressing interpretability, data privacy, scalability, and the prospects to generative AI in the context of future AI-based EMS.
comment: 62 pages, 6 figures
Systematic interval observer design for linear systems
We first develop systematic and comprehensive interval observer designs for linear time-invariant (LTI) systems, under standard assumptions of observability and interval bounds on the initial condition and uncertainties. Traditionally, such designs rely on specific transformations into Metzler (in continuous time) or non-negative (in discrete time) forms, which may impose limitations. We demonstrate that these can be effectively replaced by an LTI transformation that is straightforward to compute offline. Subsequently, we extend the framework to time-varying systems, overcoming the limitations of conventional approaches that offer no guarantees. Our method utilizes dynamic transformations into higher-dimensional target systems, for which interval observers can always be constructed. These transformations become left-invertible after a finite time, provided the system is observable and the target dynamics are sufficiently high-dimensional and fast, thereby enabling the recovery of interval bounds in the original coordinates. Academic examples are provided to illustrate the proposed methodology.
comment: 7 pages, 3 figures
Robotics
H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Visuomotor policy learning has witnessed substantial progress in robotic manipulation, with recent approaches predominantly relying on generative models to model the action distribution. However, these methods often overlook the critical coupling between visual perception and action prediction. In this work, we introduce $\textbf{Triply-Hierarchical Diffusion Policy}~(\textbf{H$^{\mathbf{3}}$DP})$, a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation. H$^{3}$DP contains $\mathbf{3}$ levels of hierarchy: (1) depth-aware input layering that organizes RGB-D observations based on depth information; (2) multi-scale visual representations that encode semantic features at varying levels of granularity; and (3) a hierarchically conditioned diffusion process that aligns the generation of coarse-to-fine actions with corresponding visual features. Extensive experiments demonstrate that H$^{3}$DP yields a $\mathbf{+27.5\%}$ average relative improvement over baselines across $\mathbf{44}$ simulation tasks and achieves superior performance in $\mathbf{4}$ challenging bimanual real-world manipulation tasks. Project Page: https://lyy-iiis.github.io/h3dp/.
Pixel Motion as Universal Representation for Robot Control
We present LangToMo, a vision-language-action framework structured as a dual-system architecture that uses pixel motion forecasts as intermediate representations. Our high-level System 2, an image diffusion model, generates text-conditioned pixel motion sequences from a single frame to guide robot control. Pixel motion-a universal, interpretable, and motion-centric representation-can be extracted from videos in a self-supervised manner, enabling diffusion model training on web-scale video-caption data. Treating generated pixel motion as learned universal representations, our low level System 1 module translates these into robot actions via motion-to-action mapping functions, which can be either hand-crafted or learned with minimal supervision. System 2 operates as a high-level policy applied at sparse temporal intervals, while System 1 acts as a low-level policy at dense temporal intervals. This hierarchical decoupling enables flexible, scalable, and generalizable robot control under both unsupervised and supervised settings, bridging the gap between language, motion, and action. Checkout https://kahnchana.github.io/LangToMo for visualizations.
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models
Exploration is essential for general-purpose robotic learning, especially in open-ended environments where dense rewards, explicit goals, or task-specific supervision are scarce. Vision-language models (VLMs), with their semantic reasoning over objects, spatial relations, and potential outcomes, present a compelling foundation for generating high-level exploratory behaviors. However, their outputs are often ungrounded, making it difficult to determine whether imagined transitions are physically feasible or informative. To bridge the gap between imagination and execution, we present IVE (Imagine, Verify, Execute), an agentic exploration framework inspired by human curiosity. Human exploration is often driven by the desire to discover novel scene configurations and to deepen understanding of the environment. Similarly, IVE leverages VLMs to abstract RGB-D observations into semantic scene graphs, imagine novel scenes, predict their physical plausibility, and generate executable skill sequences through action tools. We evaluate IVE in both simulated and real-world tabletop environments. The results show that IVE enables more diverse and meaningful exploration than RL baselines, as evidenced by a 4.1 to 7.8x increase in the entropy of visited states. Moreover, the collected experience supports downstream learning, producing policies that closely match or exceed the performance of those trained on human-collected demonstrations.
comment: Project webpage: https://ive-robot.github.io/
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data? In DexWild, a diverse team of data collectors uses their hands to collect hours of interactions across a multitude of environments and objects. To record this data, we create DexWild-System, a low-cost, mobile, and easy-to-use device. The DexWild learning framework co-trains on both human and robot demonstrations, leading to improved performance compared to training on each dataset individually. This combination results in robust robot policies capable of generalizing to novel environments, tasks, and embodiments with minimal additional robot-specific data. Experimental results demonstrate that DexWild significantly improves performance, achieving a 68.5% success rate in unseen environments-nearly four times higher than policies trained with robot data only-and offering 5.8x better cross-embodiment generalization. Video results, codebases, and instructions at https://dexwild.github.io
comment: In RSS 2025. Website at https://dexwild.github.io
AcoustoBots: A swarm of robots for acoustophoretic multimodal interactions
Acoustophoresis has enabled novel interaction capabilities, such as levitation, volumetric displays, mid-air haptic feedback, and directional sound generation, to open new forms of multimodal interactions. However, its traditional implementation as a singular static unit limits its dynamic range and application versatility. This paper introduces AcoustoBots - a novel convergence of acoustophoresis with a movable and reconfigurable phased array of transducers for enhanced application versatility. We mount a phased array of transducers on a swarm of robots to harness the benefits of multiple mobile acoustophoretic units. This offers a more flexible and interactive platform that enables a swarm of acoustophoretic multimodal interactions. Our novel AcoustoBots design includes a hinge actuation system that controls the orientation of the mounted phased array of transducers to achieve high flexibility in a swarm of acoustophoretic multimodal interactions. In addition, we designed a BeadDispenserBot that can deliver particles to trapping locations, which automates the acoustic levitation interaction. These attributes allow AcoustoBots to independently work for a common cause and interchange between modalities, allowing for novel augmentations (e.g., a swarm of haptics, audio, and levitation) and bilateral interactions with users in an expanded interaction area. We detail our design considerations, challenges, and methodological approach to extend acoustophoretic central control in distributed settings. This work demonstrates a scalable acoustic control framework with two mobile robots, laying the groundwork for future deployment in larger robotic swarms. Finally, we characterize the performance of our AcoustoBots and explore the potential interactive scenarios they can enable.
Improving Trajectory Stitching with Flow Models
Generative models have shown great promise as trajectory planners, given their affinity to modeling complex distributions and guidable inference process. Previous works have successfully applied these in the context of robotic manipulation but perform poorly when the required solution does not exist as a complete trajectory within the training set. We identify that this is a result of being unable to plan via stitching, and subsequently address the architectural and dataset choices needed to remedy this. On top of this, we propose a novel addition to the training and inference procedures to both stabilize and enhance these capabilities. We demonstrate the efficacy of our approach by generating plans with out of distribution boundary conditions and performing obstacle avoidance on the Franka Panda in simulation and on real hardware. In both of these tasks our method performs significantly better than the baselines and is able to avoid obstacles up to four times as large.
Multi-Agent Path Finding via Finite-Horizon Hierarchical Factorization
We present a novel algorithm for large-scale Multi-Agent Path Finding (MAPF) that enables fast, scalable planning in dynamic environments such as automated warehouses. Our approach introduces finite-horizon hierarchical factorization, a framework that plans one step at a time in a receding-horizon fashion. Robots first compute individual plans in parallel, and then dynamically group based on spatio-temporal conflicts and reachability. The framework accounts for conflict resolution, and for immediate execution and concurrent planning, significantly reducing response time compared to offline algorithms. Experimental results on benchmark maps demonstrate that our method achieves up to 60% reduction in time-to-first-action while consistently delivering high-quality solutions, outperforming state-of-the-art offline baselines across a range of problem sizes and planning horizons.
Privacy Risks of Robot Vision: A User Study on Image Modalities and Resolution
User privacy is a crucial concern in robotic applications, especially when mobile service robots are deployed in personal or sensitive environments. However, many robotic downstream tasks require the use of cameras, which may raise privacy risks. To better understand user perceptions of privacy in relation to visual data, we conducted a user study investigating how different image modalities and image resolutions affect users' privacy concerns. The results show that depth images are broadly viewed as privacy-safe, and a similarly high proportion of respondents feel the same about semantic segmentation images. Additionally, the majority of participants consider 32*32 resolution RGB images to be almost sufficiently privacy-preserving, while most believe that 16*16 resolution can fully guarantee privacy protection.
Guiding Data Collection via Factored Scaling Curves
Generalist imitation learning policies trained on large datasets show great promise for solving diverse manipulation tasks. However, to ensure generalization to different conditions, policies need to be trained with data collected across a large set of environmental factor variations (e.g., camera pose, table height, distractors) $-$ a prohibitively expensive undertaking, if done exhaustively. We introduce a principled method for deciding what data to collect and how much to collect for each factor by constructing factored scaling curves (FSC), which quantify how policy performance varies as data scales along individual or paired factors. These curves enable targeted data acquisition for the most influential factor combinations within a given budget. We evaluate the proposed method through extensive simulated and real-world experiments, across both training-from-scratch and fine-tuning settings, and show that it boosts success rates in real-world tasks in new environments by up to 26% over existing data-collection strategies. We further demonstrate how factored scaling curves can effectively guide data collection using an offline metric, without requiring real-world evaluation at scale.
comment: Project website: https://factored-data-scaling.github.io
Hybrid Control Strategies for Safe and Adaptive Robot-Assisted Dressing
Safety, reliability, and user trust are crucial in human-robot interaction (HRI) where the robots must address hazards in real-time. This study presents hazard driven low-level control strategies implemented in robot-assisted dressing (RAD) scenarios where hazards like garment snags and user discomfort in real-time can affect task performance and user safety. The proposed control mechanisms include: (1) Garment Snagging Control Strategy, which detects excessive forces and either seeks user intervention via a chatbot or autonomously adjusts its trajectory, and (2) User Discomfort/Pain Mitigation Strategy, which dynamically reduces velocity based on user feedback and aborts the task if necessary. We used physical dressing trials in order to evaluate these control strategies. Results confirm that integrating force monitoring with user feedback improves safety and task continuity. The findings emphasise the need for hybrid approaches that balance autonomous intervention, user involvement, and controlled task termination, supported by bi-directional interaction and real-time user-driven adaptability, paving the way for more responsive and personalised HRI systems.
FD-RIO: Fast Dense Radar Inertial Odometry
Radar-based odometry is a popular solution for ego-motion estimation in conditions where other exteroceptive sensors may degrade, whether due to poor lighting or challenging weather conditions; however, scanning radars have the downside of relatively lower sampling rate and spatial resolution. In this work, we present FD-RIO, a method to alleviate this problem by fusing noisy, drift-prone, but high-frequency IMU data with dense radar scans. To the best of our knowledge, this is the first attempt to fuse dense scanning radar odometry with IMU using a Kalman filter. We evaluate our methods using two publicly available datasets and report accuracies using standard KITTI evaluation metrics, in addition to ablation tests and runtime analysis. Our phase correlation -based approach is compact, intuitive, and is designed to be a practical solution deployable on a realistic hardware setup of a mobile platform. Despite its simplicity, FD-RIO is on par with other state-of-the-art methods and outperforms in some test sequences.
comment: 10 pages, 24 figures
DATAMUt: Deterministic Algorithms for Time-Delay Attack Detection in Multi-Hop UAV Networks
Unmanned Aerial Vehicles (UAVs), also known as drones, have gained popularity in various fields such as agriculture, emergency response, and search and rescue operations. UAV networks are susceptible to several security threats, such as wormhole, jamming, spoofing, and false data injection. Time Delay Attack (TDA) is a unique attack in which malicious UAVs intentionally delay packet forwarding, posing significant threats, especially in time-sensitive applications. It is challenging to distinguish malicious delay from benign network delay due to the dynamic nature of UAV networks, intermittent wireless connectivity, or the Store-Carry-Forward (SCF) mechanism during multi-hop communication. Some existing works propose machine learning-based centralized approaches to detect TDA, which are computationally intensive and have large message overheads. This paper proposes a novel approach DATAMUt, where the temporal dynamics of the network are represented by a weighted time-window graph (TWiG), and then two deterministic polynomial-time algorithms are presented to detect TDA when UAVs have global and local network knowledge. Simulation studies show that the proposed algorithms have reduced message overhead by a factor of five and twelve in global and local knowledge, respectively, compared to existing approaches. Additionally, our approaches achieve approximately 860 and 1050 times less execution time in global and local knowledge, respectively, outperforming the existing methods.
Intuitive Human-Robot Interfaces Leveraging on Autonomy Features for the Control of Highly-redundant Robots
[...] With the TelePhysicalOperation interface, the user can teleoperate the different capabilities of a robot (e.g., single/double arm manipulation, wheel/leg locomotion) by applying virtual forces on selected robot body parts. This approach emulates the intuitiveness of physical human-robot interaction, but at the same time it permits to teleoperate the robot from a safe distance, in a way that resembles a "Marionette" interface. The system is further enhanced with wearable haptic feedback functions to align better with the "Marionette" metaphor, and a user study has been conducted to validate its efficacy with and without the haptic channel enabled. Considering the importance of robot independence, the TelePhysicalOperation interface incorporates autonomy modules to face, for example, the teleoperation of dual-arm mobile base robots for bimanual object grasping and transportation tasks. With the laser-guided interface, the user can indicate points of interest to the robot through the utilization of a simple but effective laser emitter device. With a neural network-based vision system, the robot tracks the laser projection in real time, allowing the user to indicate not only fixed goals, like objects, but also paths to follow. With the implemented autonomous behavior, a mobile manipulator employs its locomanipulation abilities to follow the indicated goals. The behavior is modeled using Behavior Trees, exploiting their reactivity to promptly respond to changes in goal positions, and their modularity to adapt the motion planning to the task needs. The proposed laser interface has also been employed in an assistive scenario. In this case, users with upper limbs impairments can control an assistive manipulator by directing a head-worn laser emitter to the point of interests, to collaboratively address activities of everyday life. [...]
Neural Brain: A Neuroscience-inspired Framework for Embodied Agents
The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the rise of embodied AI, where autonomous agents, such as humanoid robots, must navigate and manipulate unstructured environments with human-like adaptability. At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability. A Neural Brain must seamlessly integrate multimodal sensing and perception with cognitive capabilities. Achieving this also requires an adaptive memory system and energy-efficient hardware-software co-design, enabling real-time action in dynamic environments. This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges: (1) defining the core components of Neural Brain and (2) bridging the gap between static AI models and the dynamic adaptability required for real-world deployment. To this end, we propose a biologically inspired architecture that integrates multimodal active sensing, perception-cognition-action function, neuroplasticity-based memory storage and updating, and neuromorphic hardware/software optimization. Furthermore, we also review the latest research on embodied agents across these four aspects and analyze the gap between current AI systems and human intelligence. By synthesizing insights from neuroscience, we outline a roadmap towards the development of generalizable, autonomous agents capable of human-level intelligence in real-world scenarios.
comment: 51 pages, 17 figures, 9 tables
Beyond Static Perception: Integrating Temporal Context into VLMs for Cloth Folding ICRA 2025
Manipulating clothes is challenging due to their complex dynamics, high deformability, and frequent self-occlusions. Garments exhibit a nearly infinite number of configurations, making explicit state representations difficult to define. In this paper, we analyze BiFold, a model that predicts language-conditioned pick-and-place actions from visual observations, while implicitly encoding garment state through end-to-end learning. To address scenarios such as crumpled garments or recovery from failed manipulations, BiFold leverages temporal context to improve state estimation. We examine the internal representations of the model and present evidence that its fine-tuning and temporal context enable effective alignment between text and image regions, as well as temporal consistency.
comment: Accepted at ICRA 2025 Workshop "Reflections on Representations and Manipulating Deformable Objects". Project page https://barbany.github.io/bifold/
On rapid parallel tuning of controllers of a swarm of MAVs -- distribution strategies of the updated gains
In this paper, we present a reliable, scalable, time deterministic, model-free procedure to tune swarms of Micro Aerial Vehicles (MAVs) using basic sensory data. Two approaches to taking advantage of parallel tuning are presented. First, the tuning with averaging of the results on the basis of performance indices reported from the swarm with identical gains to decrease the negative effect of the noise in the measurements. Second, the tuning with parallel testing of varying set of gains across the swarm to reduce the tuning time. The presented methods were evaluated both in simulation and real-world experiments. The achieved results show the ability of the proposed approach to improve the results of the tuning while decreasing the tuning time, ensuring at the same time a reliable tuning mechanism.
comment: 7 pages, 7 figures
Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks
This report presents our reinforcement learning-based approach for the swing-up and stabilisation tasks of the acrobot and pendubot, tailored specifcially to the updated guidelines of the 3rd AI Olympics at ICRA 2025. Building upon our previously developed Average-Reward Entropy Advantage Policy Optimization (AR-EAPO) algorithm, we refined our solution to effectively address the new competition scenarios and evaluation metrics. Extensive simulations validate that our controller robustly manages these revised tasks, demonstrating adaptability and effectiveness within the updated framework.
GelFusion: Enhancing Robotic Manipulation under Visual Constraints via Visuotactile Fusion
Visuotactile sensing offers rich contact information that can help mitigate performance bottlenecks in imitation learning, particularly under vision-limited conditions, such as ambiguous visual cues or occlusions. Effectively fusing visual and visuotactile modalities, however, presents ongoing challenges. We introduce GelFusion, a framework designed to enhance policies by integrating visuotactile feedback, specifically from high-resolution GelSight sensors. GelFusion using a vision-dominated cross-attention fusion mechanism incorporates visuotactile information into policy learning. To better provide rich contact information, the framework's core component is our dual-channel visuotactile feature representation, simultaneously leveraging both texture-geometric and dynamic interaction features. We evaluated GelFusion on three contact-rich tasks: surface wiping, peg insertion, and fragile object pick-and-place. Outperforming baselines, GelFusion shows the value of its structure in improving the success rate of policy learning.
TPT-Bench: A Large-Scale, Long-Term and Robot-Egocentric Dataset for Benchmarking Target Person Tracking
Tracking a target person from robot-egocentric views is crucial for developing autonomous robots that provide continuous personalized assistance or collaboration in Human-Robot Interaction (HRI) and Embodied AI. However, most existing target person tracking (TPT) benchmarks are limited to controlled laboratory environments with few distractions, clean backgrounds, and short-term occlusions. In this paper, we introduce a large-scale dataset designed for TPT in crowded and unstructured environments, demonstrated through a robot-person following task. The dataset is collected by a human pushing a sensor-equipped cart while following a target person, capturing human-like following behavior and emphasizing long-term tracking challenges, including frequent occlusions and the need for re-identification from numerous pedestrians. It includes multi-modal data streams, including odometry, 3D LiDAR, IMU, panoptic, and RGB-D images, along with exhaustively annotated 2D bounding boxes of the target person across 35 sequences, both indoors and outdoors. Using this dataset and visual annotations, we perform extensive experiments with existing TPT methods, offering a thorough analysis of their limitations and suggesting future research directions.
comment: Under review. web: https://medlartea.github.io/tpt-bench/
Cooperative Assembly with Autonomous Mobile Manipulators in an Underwater Scenario
[...] Specifically, the problem addressed is an assembly one known as the peg-in-hole task. In this case, two autonomous manipulators must carry cooperatively (at kinematic level) a peg and must insert it into an hole fixed in the environment. Even if the peg-in-hole is a well-known problem, there are no specific studies related to the use of two different autonomous manipulators, especially in underwater scenarios. Among all the possible investigations towards the problem, this work focuses mainly on the kinematic control of the robots. The methods used are part of the Task Priority Inverse Kinematics (TPIK) approach, with a cooperation scheme that permits to exchange as less information as possible between the agents (that is really important being water a big impediment for communication). A force-torque sensor is exploited at kinematic level to help the insertion phase. The results show how the TPIK and the chosen cooperation scheme can be used for the stated problem. The simulated experiments done consider little errors in the hole's pose, that still permit to insert the peg but with a lot of frictions and possible stucks. It is shown how can be possible to improve (thanks to the data provided by the force-torque sensor) the insertion phase performed by the two manipulators in presence of these errors. [...]
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Vision-Language-Action (VLA) models have shown great potential in general robotic decision-making tasks via imitation learning. However, the variable quality of training data often constrains the performance of these models. On the other hand, offline Reinforcement Learning (RL) excels at learning robust policy models from mixed-quality data. In this paper, we introduce Reinforced robot GPT (ReinboT), a novel end-to-end VLA model that integrates the RL principle of maximizing cumulative reward. ReinboT achieves a deeper understanding of the data quality distribution by predicting dense returns that capture the nuances of manipulation tasks. The dense return prediction capability enables the robot to generate more robust decision-making actions, oriented towards maximizing future benefits. Extensive experiments show that ReinboT achieves state-of-the-art performance on the CALVIN mixed-quality dataset and exhibits superior few-shot learning and out-of-distribution generalization capabilities in real-world tasks.
Stiffness-based Analytic Centre Method for Cable-Driven Parallel Robots
Nowadays, being fast and precise are key requirements in Robotics. This work introduces a novel methodology to tune the stiffness of Cable-Driven Parallel Robots (CDPRs) while simultaneously addressing the tension distribution problem. In particular, the approach relies on the Analytic-Centre method. Indeed, weighting the barrier functions makes natural the stiffness adaptation. The intrinsic ability to adjust the stiffness during the execution of the task enables the CDPRs to effectively meet above-mentioned requirements. The capabilities of the method are demonstrated through simulations by comparing it with the existing approach.
comment: Conference paper
Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
Autonomous racing presents unique challenges due to its non-linear dynamics, the high speed involved, and the critical need for real-time decision-making under dynamic and unpredictable conditions. Most traditional Reinforcement Learning (RL) approaches rely on extensive simulation-based pre-training, which faces crucial challenges in transfer effectively to real-world environments. This paper introduces a robust on-board RL framework for autonomous racing, designed to eliminate the dependency on simulation-based pre-training enabling direct real-world adaptation. The proposed system introduces a refined Soft Actor-Critic (SAC) algorithm, leveraging a residual RL structure to enhance classical controllers in real-time by integrating multi-step Temporal-Difference (TD) learning, an asynchronous training pipeline, and Heuristic Delayed Reward Adjustment (HDRA) to improve sample efficiency and training stability. The framework is validated through extensive experiments on the F1TENTH racing platform, where the residual RL controller consistently outperforms the baseline controllers and achieves up to an 11.5 % reduction in lap times compared to the State-of-the-Art (SotA) with only 20 min of training. Additionally, an End-to-End (E2E) RL controller trained without a baseline controller surpasses the previous best results with sustained on-track learning. These findings position the framework as a robust solution for high-performance autonomous racing and a promising direction for other real-time, dynamic autonomous systems.
Autonomous Robotic Pruning in Orchards and Vineyards: a Review
Manual pruning is labor intensive and represents up to 25% of annual labor costs in fruit production, notably in apple orchards and vineyards where operational challenges and cost constraints limit the adoption of large-scale machinery. In response, a growing body of research is investigating compact, flexible robotic platforms capable of precise pruning in varied terrains, particularly where traditional mechanization falls short. This paper reviews recent advances in autonomous robotic pruning for orchards and vineyards, addressing a critical need in precision agriculture. Our review examines literature published between 2014 and 2024, focusing on innovative contributions across key system components. Special attention is given to recent developments in machine vision, perception, plant skeletonization, and control strategies, areas that have experienced significant influence from advancements in artificial intelligence and machine learning. The analysis situates these technological trends within broader agricultural challenges, including rising labor costs, a decline in the number of young farmers, and the diverse pruning requirements of different fruit species such as apple, grapevine, and cherry trees. By comparing various robotic architectures and methodologies, this survey not only highlights the progress made toward autonomous pruning but also identifies critical open challenges and future research directions. The findings underscore the potential of robotic systems to bridge the gap between manual and mechanized operations, paving the way for more efficient, sustainable, and precise agricultural practices.
HuB: Learning Extreme Humanoid Balance
The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1.5 meters-both requiring precise balance control. While recent research on humanoid control has leveraged reinforcement learning to track human motions for skill acquisition, applying this paradigm to balance-intensive tasks remains challenging. In this work, we identify three key obstacles: instability from reference motion errors, learning difficulties due to morphological mismatch, and the sim-to-real gap caused by sensor noise and unmodeled dynamics. To address these challenges, we propose HuB (Humanoid Balance), a unified framework that integrates reference motion refinement, balance-aware policy learning, and sim-to-real robustness training, with each component targeting a specific challenge. We validate our approach on the Unitree G1 humanoid robot across challenging quasi-static balance tasks, including extreme single-legged poses such as Swallow Balance and Bruce Lee's Kick. Our policy remains stable even under strong physical disturbances-such as a forceful soccer strike-while baseline methods consistently fail to complete these tasks. Project website: https://hub-robot.github.io
comment: Project website: https://hub-robot.github.io
BETTY Dataset: A Multi-modal Dataset for Full-Stack Autonomy ICRA 2025
We present the BETTY dataset, a large-scale, multi-modal dataset collected on several autonomous racing vehicles, targeting supervised and self-supervised state estimation, dynamics modeling, motion forecasting, perception, and more. Existing large-scale datasets, especially autonomous vehicle datasets, focus primarily on supervised perception, planning, and motion forecasting tasks. Our work enables multi-modal, data-driven methods by including all sensor inputs and the outputs from the software stack, along with semantic metadata and ground truth information. The dataset encompasses 4 years of data, currently comprising over 13 hours and 32TB, collected on autonomous racing vehicle platforms. This data spans 6 diverse racing environments, including high-speed oval courses, for single and multi-agent algorithm evaluation in feature-sparse scenarios, as well as high-speed road courses with high longitudinal and lateral accelerations and tight, GPS-denied environments. It captures highly dynamic states, such as 63 m/s crashes, loss of tire traction, and operation at the limit of stability. By offering a large breadth of cross-modal and dynamic data, the BETTY dataset enables the training and testing of full autonomy stack pipelines, pushing the performance of all algorithms to the limits. The current dataset is available at https://pitt-mit-iac.github.io/betty-dataset/.
comment: 8 pages. 5 figures. ICRA 2025
CHD: Coupled Hierarchical Diffusion for Long-Horizon Tasks
Diffusion-based planners have shown strong performance in short-horizon tasks but often fail in complex, long-horizon settings. We trace the failure to loose coupling between high-level (HL) sub-goal selection and low-level (LL) trajectory generation, which leads to incoherent plans and degraded performance. We propose Coupled Hierarchical Diffusion (CHD), a framework that models HL sub-goals and LL trajectories jointly within a unified diffusion process. A shared classifier passes LL feedback upstream so that sub-goals self-correct while sampling proceeds. This tight HL-LL coupling improves trajectory coherence and enables scalable long-horizon diffusion planning. Experiments across maze navigation, tabletop manipulation, and household environments show that CHD consistently outperforms both flat and hierarchical diffusion baselines.
A Framework for Joint Grasp and Motion Planning in Confined Spaces
Robotic grasping is a fundamental skill across all domains of robot applications. There is a large body of research for grasping objects in table-top scenarios, where finding suitable grasps is the main challenge. In this work, we are interested in scenarios where the objects are in confined spaces and hence particularly difficult to reach. Planning how the robot approaches the object becomes a major part of the challenge, giving rise to methods for joint grasp and motion planning. The framework proposed in this paper provides 20 benchmark scenarios with systematically increasing difficulty, realistic objects with precomputed grasp annotations, and tools to create and share more scenarios. We further provide two baseline planners and evaluate them on the scenarios, demonstrating that the proposed difficulty levels indeed offer a meaningful progression. We invite the research community to build upon this framework by making all components publicly available as open source.
comment: 2024 13th International Workshop on Robot Motion and Control (RoMoCo)
Towards Accurate State Estimation: Kalman Filter Incorporating Motion Dynamics for 3D Multi-Object Tracking
This work addresses the critical lack of precision in state estimation in the Kalman filter for 3D multi-object tracking (MOT) and the ongoing challenge of selecting the appropriate motion model. Existing literature commonly relies on constant motion models for estimating the states of objects, neglecting the complex motion dynamics unique to each object. Consequently, trajectory division and imprecise object localization arise, especially under occlusion conditions. The core of these challenges lies in the limitations of the current Kalman filter formulation, which fails to account for the variability of motion dynamics as objects navigate their environments. This work introduces a novel formulation of the Kalman filter that incorporates motion dynamics, allowing the motion model to adaptively adjust according to changes in the object's movement. The proposed Kalman filter substantially improves state estimation, localization, and trajectory prediction compared to the traditional Kalman filter. This is reflected in tracking performance that surpasses recent benchmarks on the KITTI and Waymo Open Datasets, with margins of 0.56\% and 0.81\% in higher order tracking accuracy (HOTA) and multi-object tracking accuracy (MOTA), respectively. Furthermore, the proposed Kalman filter consistently outperforms the baseline across various detectors. Additionally, it shows an enhanced capability in managing long occlusions compared to the baseline Kalman filter, achieving margins of 1.22\% in higher order tracking accuracy (HOTA) and 1.55\% in multi-object tracking accuracy (MOTA) on the KITTI dataset. The formulation's efficiency is evident, with an additional processing time of only approximately 0.078 ms per frame, ensuring its applicability in real-time applications.
UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning
We present UAV-CodeAgents, a scalable multi-agent framework for autonomous UAV mission generation, built on large language and vision-language models (LLMs/VLMs). The system leverages the ReAct (Reason + Act) paradigm to interpret satellite imagery, ground high-level natural language instructions, and collaboratively generate UAV trajectories with minimal human supervision. A core component is a vision-grounded, pixel-pointing mechanism that enables precise localization of semantic targets on aerial maps. To support real-time adaptability, we introduce a reactive thinking loop, allowing agents to iteratively reflect on observations, revise mission goals, and coordinate dynamically in evolving environments. UAV-CodeAgents is evaluated on large-scale mission scenarios involving industrial and environmental fire detection. Our results show that a lower decoding temperature (0.5) yields higher planning reliability and reduced execution time, with an average mission creation time of 96.96 seconds and a success rate of 93%. We further fine-tune Qwen2.5VL-7B on 9,000 annotated satellite images, achieving strong spatial grounding across diverse visual categories. To foster reproducibility and future research, we will release the full codebase and a novel benchmark dataset for vision-language-based UAV planning.
comment: Submitted
Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection
Generalizing an object detector trained on a single domain to multiple unseen domains is a challenging task. Existing methods typically introduce image or feature augmentation to diversify the source domain to raise the robustness of the detector. Vision-Language Model (VLM)-based augmentation techniques have been proven to be effective, but they require that the detector's backbone has the same structure as the image encoder of VLM, limiting the detector framework selection. To address this problem, we propose Language-Driven Dual Style Mixing (LDDS) for single-domain generalization, which diversifies the source domain by fully utilizing the semantic information of the VLM. Specifically, we first construct prompts to transfer style semantics embedded in the VLM to an image translation network. This facilitates the generation of style diversified images with explicit semantic information. Then, we propose image-level style mixing between the diversified images and source domain images. This effectively mines the semantic information for image augmentation without relying on specific augmentation selections. Finally, we propose feature-level style mixing in a double-pipeline manner, allowing feature augmentation to be model-agnostic and can work seamlessly with the mainstream detector frameworks, including the one-stage, two-stage, and transformer-based detectors. Extensive experiments demonstrate the effectiveness of our approach across various benchmark datasets, including real to cartoon and normal to adverse weather tasks. The source code and pre-trained models will be publicly available at https://github.com/qinhongda8/LDDS.
comment: The source code and pre-trained models will be publicly available at https://github.com/qinhongda8/LDDS
SLAG: Scalable Language-Augmented Gaussian Splatting
Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM and CLIP. Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18 times speedup in embedding computation on a 16-GPU setup compared to OpenGaussian, while preserving embedding quality on the ScanNet and LERF datasets. For more details, visit our project website: https://slag-project.github.io/.
Graph-Based Floor Separation Using Node Embeddings and Clustering of WiFi Trajectories
Indoor positioning systems (IPSs) are increasingly vital for location-based services in complex multi-storey environments. This study proposes a novel graph-based approach for floor separation using Wi-Fi fingerprint trajectories, addressing the challenge of vertical localization in indoor settings. We construct a graph where nodes represent Wi-Fi fingerprints, and edges are weighted by signal similarity and contextual transitions. Node2Vec is employed to generate low-dimensional embeddings, which are subsequently clustered using K-means to identify distinct floors. Evaluated on the Huawei University Challenge 2021 dataset, our method outperforms traditional community detection algorithms, achieving an accuracy of 68.97%, an F1- score of 61.99%, and an Adjusted Rand Index of 57.19%. By publicly releasing the preprocessed dataset and implementation code, this work contributes to advancing research in indoor positioning. The proposed approach demonstrates robustness to signal noise and architectural complexities, offering a scalable solution for floor-level localization.
What Matters for Batch Online Reinforcement Learning in Robotics?
The ability to learn from large batches of autonomously collected data for policy improvement -- a paradigm we refer to as batch online reinforcement learning -- holds the promise of enabling truly scalable robot learning by significantly reducing the need for human effort of data collection while getting benefits from self-improvement. Yet, despite the promise of this paradigm, it remains challenging to achieve due to algorithms not being able to learn effectively from the autonomous data. For example, prior works have applied imitation learning and filtered imitation learning methods to the batch online RL problem, but these algorithms often fail to efficiently improve from the autonomously collected data or converge quickly to a suboptimal point. This raises the question of what matters for effective batch online RL in robotics. Motivated by this question, we perform a systematic empirical study of three axes -- (i) algorithm class, (ii) policy extraction methods, and (iii) policy expressivity -- and analyze how these axes affect performance and scaling with the amount of autonomous data. Through our analysis, we make several observations. First, we observe that the use of Q-functions to guide batch online RL significantly improves performance over imitation-based methods. Building on this, we show that an implicit method of policy extraction -- via choosing the best action in the distribution of the policy -- is necessary over traditional policy extraction methods from offline RL. Next, we show that an expressive policy class is preferred over less expressive policy classes. Based on this analysis, we propose a general recipe for effective batch online RL. We then show a simple addition to the recipe of using temporally-correlated noise to obtain more diversity results in further performance gains. Our recipe obtains significantly better performance and scaling compared to prior methods.
Land-Coverage Aware Path-Planning for Multi-UAV Swarms in Search and Rescue Scenarios
Unmanned Aerial Vehicles (UAVs) have become vital in search-and-rescue (SAR) missions, with autonomous mission planning improving response times and coverage efficiency. Early approaches primarily used path planning techniques such as A*, potential-fields, or Dijkstra's algorithm, while recent approaches have incorporated meta-heuristic frameworks like genetic algorithms and particle swarm optimization to balance competing objectives such as network connectivity, energy efficiency, and strategic placement of charging stations. However, terrain-aware path planning remains under-explored, despite its critical role in optimizing UAV SAR deployments. To address this gap, we present a computer-vision based terrain-aware mission planner that autonomously extracts and analyzes terrain topology to enhance SAR pre-flight planning. Our framework uses a deep segmentation network fine-tuned on our own collection of landcover datasets to transform satellite imagery into a structured, grid-based representation of the operational area. This classification enables terrain-specific UAV-task allocation, improving deployment strategies in complex environments. We address the challenge of irregular terrain partitions, by introducing a two-stage partitioning scheme that first evaluates terrain monotonicity along coordinate axes before applying a cost-based recursive partitioning process, minimizing unnecessary splits and optimizing path efficiency. Empirical validation in a high-fidelity simulation environment demonstrates that our approach improves search and dispatch time over multiple meta-heuristic techniques and against a competing state-of-the-art method. These results highlight its potential for large-scale SAR operations, where rapid response and efficient UAV coordination are critical.
comment: 8 pages, 4 figures,
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints
We introduce PRISM (Pathfinding with Rapid Information Sharing using Motion Constraints), a decentralized algorithm designed to address the multi-task multi-agent pathfinding (MT-MAPF) problem. PRISM enables large teams of agents to concurrently plan safe and efficient paths for multiple tasks while avoiding collisions. It employs a rapid communication strategy that uses information packets to exchange motion constraint information, enhancing cooperative pathfinding and situational awareness, even in scenarios without direct communication. We prove that PRISM resolves and avoids all deadlock scenarios when possible, a critical challenge in decentralized pathfinding. Empirically, we evaluate PRISM across five environments and 25 random scenarios, benchmarking it against the centralized Conflict-Based Search (CBS) and the decentralized Token Passing with Task Swaps (TPTS) algorithms. PRISM demonstrates scalability and solution quality, supporting 3.4 times more agents than CBS and handling up to 2.5 times more tasks in narrow passage environments than TPTS. Additionally, PRISM matches CBS in solution quality while achieving faster computation times, even under low-connectivity conditions. Its decentralized design reduces the computational burden on individual agents, making it scalable for large environments. These results confirm PRISM's robustness, scalability, and effectiveness in complex and dynamic pathfinding scenarios.
comment: 38 pages, 8 figures
Virtual Holonomic Constraints in Motion Planning: Revisiting Feasibility and Limitations
This paper addresses the feasibility of virtual holonomic constraints (VHCs) in the context of motion planning for underactuated mechanical systems with a single degree of underactuation. While existing literature has established a widely accepted definition of VHC, we argue that this definition is overly restrictive and excludes a broad class of admissible trajectories from consideration. To illustrate this point, we analyze a periodic motion of the Planar Vertical Take-Off and Landing (PVTOL) aircraft. The corresponding phase trajectory and reference control input are analytic functions. We demonstrate the stabilizability of this solution by constructing a feedback controller that ensures asymptotic orbital stability. However, for this solution -- as well as for a broad class of similar ones -- there exists no VHC that satisfies the conventional definition. This observation calls for a reconsideration of how the notion of VHC is defined, with the potential to significantly expand the practical applicability of VHCs in motion planning.
comment: 6 pages, 1 figure
Survey of Simulators for Aerial Robots: An Overview and In-Depth Systematic Comparisons
Uncrewed Aerial Vehicle (UAV) research faces challenges with safety, scalability, costs, and ecological impact when conducting hardware testing. High-fidelity simulators offer a vital solution by replicating real-world conditions to enable the development and evaluation of novel perception and control algorithms. However, the large number of available simulators poses a significant challenge for researchers to determine which simulator best suits their specific use-case, based on each simulator's limitations and customization readiness. In this paper we present an overview of 44 UAV simulators, including in-depth, systematic comparisons for 14 of the simulators. Additionally, we present a set of decision factors for selection of simulators, aiming to enhance the efficiency and safety of research endeavors.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
ReFeree: Radar-Based Lightweight and Robust Localization using Feature and Free space
Place recognition plays an important role in achieving robust long-term autonomy. Real-world robots face a wide range of weather conditions (e.g. overcast, heavy rain, and snowing) and most sensors (i.e. camera, LiDAR) essentially functioning within or near-visible electromagnetic waves are sensitive to adverse weather conditions, making reliable localization difficult. In contrast, radar is gaining traction due to long electromagnetic waves, which are less affected by environmental changes and weather independence. In this work, we propose a radar-based lightweight and robust place recognition. We achieve rotational invariance and lightweight by selecting a one-dimensional ring-shaped description and robustness by mitigating the impact of false detection utilizing opposite noise characteristics between free space and feature. In addition, the initial heading can be estimated, which can assist in building a SLAM pipeline that combines odometry and registration, which takes into account onboard computing. The proposed method was tested for rigorous validation across various scenarios (i.e. single session, multi-session, and different weather conditions). In particular, we validate our descriptor achieving reliable place recognition performance through the results of extreme environments that lacked structural information such as an OORD dataset.
comment: 8 pages, 8 figures, accepted to RA-L
SICNav-Diffusion: Safe and Interactive Crowd Navigation with Diffusion Trajectory Predictions
To navigate crowds without collisions, robots must interact with humans by forecasting their future motion and reacting accordingly. While learning-based prediction models have shown success in generating likely human trajectory predictions, integrating these stochastic models into a robot controller presents several challenges. The controller needs to account for interactive coupling between planned robot motion and human predictions while ensuring both predictions and robot actions are safe (i.e. collision-free). To address these challenges, we present a receding horizon crowd navigation method for single-robot multi-human environments. We first propose a diffusion model to generate joint trajectory predictions for all humans in the scene. We then incorporate these multi-modal predictions into a SICNav Bilevel MPC problem that simultaneously solves for a robot plan (upper-level) and acts as a safety filter to refine the predictions for non-collision (lower-level). Combining planning and prediction refinement into one bilevel problem ensures that the robot plan and human predictions are coupled. We validate the open-loop trajectory prediction performance of our diffusion model on the commonly used ETH/UCY benchmark and evaluate the closed-loop performance of our robot navigation method in simulation and extensive real-robot experiments demonstrating safe, efficient, and reactive robot motion.
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
One of the central challenges preventing robots from acquiring complex manipulation skills is the prohibitive cost of collecting large-scale robot demonstrations. In contrast, humans are able to learn efficiently by watching others interact with their environment. To bridge this gap, we introduce semantic action flow as a core intermediate representation capturing the essential spatio-temporal manipulator-object interactions, invariant to superficial visual differences. We present ViSA-Flow, a framework that learns this representation self-supervised from unlabeled large-scale video data. First, a generative model is pre-trained on semantic action flows automatically extracted from large-scale human-object interaction video data, learning a robust prior over manipulation structure. Second, this prior is efficiently adapted to a target robot by fine-tuning on a small set of robot demonstrations processed through the same semantic abstraction pipeline. We demonstrate through extensive experiments on the CALVIN benchmark and real-world tasks that ViSA-Flow achieves state-of-the-art performance, particularly in low-data regimes, outperforming prior methods by effectively transferring knowledge from human video observation to robotic execution. Videos are available at https://visaflow-web.github.io/ViSAFLOW.
Heterogeneous Multi-robot Task Allocation for Long-Endurance Missions in Dynamic Scenarios
We present a framework for Multi-Robot Task Allocation (MRTA) in heterogeneous teams performing long-endurance missions in dynamic scenarios. Given the limited battery of robots, especially for aerial vehicles, we allow for robot recharges and the possibility of fragmenting and/or relaying certain tasks. We also address tasks that must be performed by a coalition of robots in a coordinated manner. Given these features, we introduce a new class of heterogeneous MRTA problems which we analyze theoretically and optimally formulate as a Mixed-Integer Linear Program. We then contribute a heuristic algorithm to compute approximate solutions and integrate it into a mission planning and execution architecture capable of reacting to unexpected events by repairing or recomputing plans online. Our experimental results show the relevance of our newly formulated problem in a realistic use case for inspection with aerial robots. We assess the performance of our heuristic solver in comparison with other variants and with exact optimal solutions in small-scale scenarios. In addition, we evaluate the ability of our replanning framework to repair plans online.
comment: 20 pages, 10 figures
AdaWorld: Learning Adaptable World Models with Latent Actions ICML 2025
World models aim to learn action-controlled future prediction and have proven essential for the development of intelligent agents. However, most existing world models rely heavily on substantial action-labeled data and costly training, making it challenging to adapt to novel environments with heterogeneous actions through limited interactions. This limitation can hinder their applicability across broader domains. To overcome this limitation, we propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. The key idea is to incorporate action information during the pretraining of world models. This is achieved by extracting latent actions from videos in a self-supervised manner, capturing the most critical transitions between frames. We then develop an autoregressive world model that conditions on these latent actions. This learning paradigm enables highly adaptable world models, facilitating efficient transfer and learning of new actions even with limited interactions and finetuning. Our comprehensive experiments across multiple environments demonstrate that AdaWorld achieves superior performance in both simulation quality and visual planning.
comment: ICML 2025. Project page: https://adaptable-world-model.github.io/, code and model: https://github.com/Little-Podi/AdaWorld
Aerial Path Online Planning for Urban Scene Updation SIGGRAPH 2025
We present the first scene-update aerial path planning algorithm specifically designed for detecting and updating change areas in urban environments. While existing methods for large-scale 3D urban scene reconstruction focus on achieving high accuracy and completeness, they are inefficient for scenarios requiring periodic updates, as they often re-explore and reconstruct entire scenes, wasting significant time and resources on unchanged areas. To address this limitation, our method leverages prior reconstructions and change probability statistics to guide UAVs in detecting and focusing on areas likely to have changed. Our approach introduces a novel changeability heuristic to evaluate the likelihood of changes, driving the planning of two flight paths: a prior path informed by static priors and a dynamic real-time path that adapts to newly detected changes. The framework integrates surface sampling and candidate view generation strategies, ensuring efficient coverage of change areas with minimal redundancy. Extensive experiments on real-world urban datasets demonstrate that our method significantly reduces flight time and computational overhead, while maintaining high-quality updates comparable to full-scene re-exploration and reconstruction. These contributions pave the way for efficient, scalable, and adaptive UAV-based scene updates in complex urban environments.
comment: ACM SIGGRAPH 2025 (Patent Protected); Project page: https://vcc.tech/research/2025/DroneUpdate; Video: https://youtu.be/3nNBoh2syTU?si=LQ04NlJNyPosdm8F
Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation ICML 2025
Behavior Cloning (BC) is a widely adopted visual imitation learning method in robot manipulation. Current BC approaches often enhance generalization by leveraging large datasets and incorporating additional visual and textual modalities to capture more diverse information. However, these methods overlook whether the learned representations contain redundant information and lack a solid theoretical foundation to guide the learning process. To address these limitations, we adopt an information-theoretic perspective and introduce mutual information to quantify and mitigate redundancy in latent representations. Building on this, we incorporate the Information Bottleneck (IB) principle into BC, which extends the idea of reducing redundancy by providing a structured framework for compressing irrelevant information while preserving task-relevant features. This work presents the first comprehensive study on redundancy in latent representations across various methods, backbones, and experimental settings, while extending the generalizability of the IB to BC. Extensive experiments and analyses on the CortexBench and LIBERO benchmarks demonstrate significant performance improvements with IB, underscoring the importance of reducing input data redundancy and highlighting its practical value for more practical applications. Project Page: https://baishuanghao.github.io/BC-IB.github.io.
comment: Accepted by ICML 2025
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments
The growing integration of robots in shared environments -- such as warehouses, shopping centres, and hospitals -- demands a deep understanding of the underlying dynamics and human behaviours, including how, when, and where individuals engage in various activities and interactions. This knowledge goes beyond simple correlation studies and requires a more comprehensive causal analysis. By leveraging causal inference to model cause-and-effect relationships, we can better anticipate critical environmental factors and enable autonomous robots to plan and execute tasks more effectively. To this end, we propose a novel causality-based decision-making framework that reasons over a learned causal model to predict battery usage and human obstructions, understanding how these factors could influence robot task execution. Such reasoning framework assists the robot in deciding when and how to complete a given task. To achieve this, we developed also PeopleFlow, a new Gazebo-based simulator designed to model context-sensitive human-robot spatial interactions in shared workspaces. PeopleFlow features realistic human and robot trajectories influenced by contextual factors such as time, environment layout, and robot state, and can simulate a large number of agents. While the simulator is general-purpose, in this paper we focus on a warehouse-like environment as a case study, where we conduct an extensive evaluation benchmarking our causal approach against a non-causal baseline. Our findings demonstrate the efficacy of the proposed solutions, highlighting how causal reasoning enables autonomous robots to operate more efficiently and safely in dynamic environments shared with humans.
comment: Causal Discovery and Inference - Robot Autonomy - Human-Robot Spatial Interaction - Decision-Making
MOANA: Multi-Radar Dataset for Maritime Odometry and Autonomous Navigation Application
Maritime environmental sensing requires overcoming challenges from complex conditions such as harsh weather, platform perturbations, large dynamic objects, and the requirement for long detection ranges. While cameras and LiDAR are commonly used in ground vehicle navigation, their applicability in maritime settings is limited by range constraints and hardware maintenance issues. Radar sensors, however, offer robust long-range detection capabilities and resilience to physical contamination from weather and saline conditions, making it a powerful sensor for maritime navigation. Among various radar types, X-band radar is widely employed for maritime vessel navigation, providing effective long-range detection essential for situational awareness and collision avoidance. Nevertheless, it exhibits limitations during berthing operations where near-field detection is critical. To address this shortcoming, we incorporate W-band radar, which excels in detecting nearby objects with a higher update rate. We present a comprehensive maritime sensor dataset featuring multi-range detection capabilities. This dataset integrates short-range LiDAR data, medium-range W-band radar data, and long-range X-band radar data into a unified framework. Additionally, it includes object labels for oceanic object detection usage, derived from radar and stereo camera images. The dataset comprises seven sequences collected from diverse regions with varying levels of \bl{navigation algorithm} estimation difficulty, ranging from easy to challenging, and includes common locations suitable for global localization tasks. This dataset serves as a valuable resource for advancing research in place recognition, odometry estimation, SLAM, object detection, and dynamic object elimination within maritime environments. Dataset can be found at https://sites.google.com/view/rpmmoana.
comment: 10 pages, 9 figures, 3 tables
Analysis of Forces Exerted by Shoulder and Elbow Fabric-based Pneumatic Actuators for Pediatric Exosuits
To enhance pediatric exosuit design, it is crucial to assess the actuator-generated forces. This work evaluates the contact forces exerted by soft fabric-based pneumatic actuators in an upper extremity pediatric exosuit. Two actuators were examined: a single-cell bidirectional actuator for shoulder abduction/adduction and a bellow-type actuator for elbow extension/flexion. Experiments assessed the impact of actuator anchoring points and the adjacent joint's angle on exerted forces and actuated joint range of motion (ROM). These were measured via load cells and encoders integrated into a custom infant-scale engineered apparatus with two degrees of freedom (two revolute joints). For the shoulder actuator, results show that anchoring it further from the shoulder joint center while the elbow is flexed at $90^\circ$ yields the highest ROM while minimizing the peak force exerted on the body. For the elbow actuator, anchoring it symmetrically while the shoulder joint is at $0^\circ$ optimizes actuator performance. These findings contribute a key step toward co-optimizing the considered exosuit design for functionality and wearability.
Stereo Hand-Object Reconstruction for Human-to-Robot Handover
Jointly estimating hand and object shape facilitates the grasping task in human-to-robot handovers. However, relying on hand-crafted prior knowledge about the geometric structure of the object fails when generalising to unseen objects, and depth sensors fail to detect transparent objects such as drinking glasses. In this work, we propose a stereo-based method for hand-object reconstruction that combines single-view reconstructions probabilistically to form a coherent stereo reconstruction. We learn 3D shape priors from a large synthetic hand-object dataset to ensure that our method is generalisable, and use RGB inputs to better capture transparent objects. We show that our method reduces the object Chamfer distance compared to existing RGB based hand-object reconstruction methods on single view and stereo settings. We process the reconstructed hand-object shape with a projection-based outlier removal step and use the output to guide a human-to-robot handover pipeline with wide-baseline stereo RGB cameras. Our hand-object reconstruction enables a robot to successfully receive a diverse range of household objects from the human.
comment: 8 pages, 9 figures, 1 table. (Website: https://qm-ipalab.github.io/StereoHO/)
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
3D articulated objects modeling has long been a challenging problem, since it requires to capture both accurate surface geometries and semantically meaningful and spatially precise structures, parts, and joints. Existing methods heavily depend on training data from a limited set of handcrafted articulated object categories (e.g., cabinets and drawers), which restricts their ability to model a wide range of articulated objects in an open-vocabulary context. To address these limitations, we propose Articulate Anymesh, an automated framework that is able to convert any rigid 3D mesh into its articulated counterpart in an open-vocabulary manner. Given a 3D mesh, our framework utilizes advanced Vision-Language Models and visual prompting techniques to extract semantic information, allowing for both the segmentation of object parts and the construction of functional joints. Our experiments show that Articulate Anymesh can generate large-scale, high-quality 3D articulated objects, including tools, toys, mechanical devices, and vehicles, significantly expanding the coverage of existing 3D articulated object datasets. Additionally, we show that these generated assets can facilitate the acquisition of new articulated object manipulation skills in simulation, which can then be transferred to a real robotic system. Our Github website is https://articulate-anymesh.github.io.
Learning to Drive Anywhere with Model-Based Reannotation
Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.
comment: 19 pages, 11 figures, 8 tables
DittoGym: Learning to Control Soft Shape-Shifting Robots
Robot co-design, where the morphology of a robot is optimized jointly with a learned policy to solve a specific task, is an emerging area of research. It holds particular promise for soft robots, which are amenable to novel manufacturing techniques that can realize learned morphologies and actuators. Inspired by nature and recent novel robot designs, we propose to go a step further and explore the novel reconfigurable robots, defined as robots that can change their morphology within their lifetime. We formalize control of reconfigurable soft robots as a high-dimensional reinforcement learning (RL) problem. We unify morphology change, locomotion, and environment interaction in the same action space, and introduce an appropriate, coarse-to-fine curriculum that enables us to discover policies that accomplish fine-grained control of the resulting robots. We also introduce DittoGym, a comprehensive RL benchmark for reconfigurable soft robots that require fine-grained morphology changes to accomplish the tasks. Finally, we evaluate our proposed coarse-to-fine algorithm on DittoGym and demonstrate robots that learn to change their morphology several times within a sequence, uniquely enabled by our RL algorithm. More results are available at https://suninghuang19.github.io/dittogym_page/.
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Language-goal aerial navigation is a critical challenge in embodied AI, requiring UAVs to localize targets in complex environments such as urban blocks based on textual specification. Existing methods, often adapted from indoor navigation, struggle to scale due to limited field of view, semantic ambiguity among objects, and lack of structured spatial reasoning. In this work, we propose GeoNav, a geospatially aware multimodal agent to enable long-range navigation. GeoNav operates in three phases-landmark navigation, target search, and precise localization-mimicking human coarse-to-fine spatial strategies. To support such reasoning, it dynamically builds two different types of spatial memory. The first is a global but schematic cognitive map, which fuses prior textual geographic knowledge and embodied visual cues into a top-down, annotated form for fast navigation to the landmark region. The second is a local but delicate scene graph representing hierarchical spatial relationships between blocks, landmarks, and objects, which is used for definite target localization. On top of this structured representation, GeoNav employs a spatially aware, multimodal chain-of-thought prompting mechanism to enable multimodal large language models with efficient and interpretable decision-making across stages. On the CityNav urban navigation benchmark, GeoNav surpasses the current state-of-the-art by up to 12.53% in success rate and significantly improves navigation efficiency, even in hard-level tasks. Ablation studies highlight the importance of each module, showcasing how geospatial representations and coarse-to-fine reasoning enhance UAV navigation.
Follow Everything: A Leader-Following and Obstacle Avoidance Framework with Goal-Aware Adaptation
Robust and flexible leader-following is a critical capability for robots to integrate into human society. While existing methods struggle to generalize to leaders of arbitrary form and often fail when the leader temporarily leaves the robot's field of view, this work introduces a unified framework addressing both challenges. First, traditional detection models are replaced with a segmentation model, allowing the leader to be anything. To enhance recognition robustness, a distance frame buffer is implemented that stores leader embeddings at multiple distances, accounting for the unique characteristics of leader-following tasks. Second, a goal-aware adaptation mechanism is designed to govern robot planning states based on the leader's visibility and motion, complemented by a graph-based planner that generates candidate trajectories for each state, ensuring efficient following with obstacle avoidance. Simulations and real-world experiments with a legged robot follower and various leaders (human, ground robot, UAV, legged robot, stop sign) in both indoor and outdoor environments show competitive improvements in follow success rate, reduced visual loss duration, lower collision rate, and decreased leader-follower distance.
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Gaussian splatting has emerged as a powerful tool for high-fidelity reconstruction of dynamic scenes. However, existing methods primarily rely on implicit motion representations, such as encoding motions into neural networks or per-Gaussian parameters, which makes it difficult to further manipulate the reconstructed motions. This lack of explicit controllability limits existing methods to replaying recorded motions only, which hinders a wider application in robotics. To address this, we propose Motion Blender Gaussian Splatting (MBGS), a novel framework that uses motion graphs as an explicit and sparse motion representation. The motion of a graph's links is propagated to individual Gaussians via dual quaternion skinning, with learnable weight painting functions that determine the influence of each link. The motion graphs and 3D Gaussians are jointly optimized from input videos via differentiable rendering. Experiments show that MBGS achieves state-of-the-art performance on the highly challenging iPhone dataset while being competitive on HyperNeRF. We demonstrate the application potential of our method in animating novel object poses, synthesizing real robot demonstrations, and predicting robot actions through visual planning. The source code, models, video demonstrations can be found at http://mlzxy.github.io/motion-blender-gs.
Consensus Building in Human-robot Co-learning via Bias Controlled Nonlinear Opinion Dynamics and Non-verbal Communication through Robotic Eyes
Consensus between humans and robots is crucial as robotic agents become more prevalent and deeply integrated into our daily lives. This integration presents both unprecedented opportunities and notable challenges for effective collaboration. However, the active guidance of human actions and their integration in co-learning processes, where humans and robots mutually learn from each other, remains under-explored. This article demonstrates how consensus between human and robot opinions can be established by modeling decision-making processes as non-linear opinion dynamics. We utilize dynamic bias as a control parameter to steer the robot's opinion toward consensus and employ visual cues via a robotic eye gaze to guide human decisions. These non-verbal cues communicate the robot's future intentions, gradually guiding human decisions to align with them. To design robot behavior for consensus, we integrate a human opinion observation algorithm with the robot's opinion formation, controlling its actions based on that formed opinion. Experiments with $51$ participants ($N=51$) in a two-choice decision-making task show that effective consensus and trust can be established in a human--robot co-learning setting by guiding human decisions through nonverbal robotic cues and using bias-controlled opinion dynamics to shape robot behavior. Finally, we provide detailed information on the perceived cognitive load and the behavior of robotic eyes based on user feedback and post-experiment interviews.
comment: 48 pages, 8 figures
Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots ICRA 2025
This paper addresses the challenge of developing a multi-arm quadrupedal robot capable of efficiently harvesting fruit in complex, natural environments. To overcome the inherent limitations of traditional bimanual manipulation, we introduce the first three-arm quadrupedal robot LocoHarv-3 and propose a novel hierarchical tri-manual planning approach, enabling automated fruit harvesting with collision-free trajectories. Our comprehensive semi-autonomous framework integrates teleoperation, supported by LiDAR-based odometry and mapping, with learning-based visual perception for accurate fruit detection and pose estimation. Validation is conducted through a series of controlled indoor experiments using motion capture and extensive field tests in natural settings. Results demonstrate a 90\% success rate in in-lab settings with a single attempt, and field trials further verify the system's robustness and efficiency in more challenging real-world environments.
comment: 7 pages, 8 figures. Accepted by ICRA 2025
Embodying Control in Soft Multistable Robots from Morphofunctional Co-design
Soft robots are distinguished by their flexibility and adaptability, allowing them to perform nearly impossible tasks for rigid robots. However, controlling their behavior is challenging due to their nonlinear material response and infinite degrees of freedom. A potential solution to these challenges is to discretize the infinite-dimensional configuration space into a finite but sufficiently large number of functional modes with programmed dynamics. We present a strategy for co-designing the desired tasks and morphology of pneumatically actuated soft robots with multiple encoded stable states and dynamic responses. Our approach introduces a general method to capture the soft robots' response using an energy-based analytical model, the parameters of which are obtained using Recursive Feature Elimination. The resulting lumped-parameter model facilitates inverse co-design of the robot's morphology and planned tasks by embodying specific dynamics upon actuation. We illustrate our approach's ability to explore the configuration space by co-designing kinematics with optimized stiffnesses and time responses to obtain robots capable of classifying the size and weight of objects and displaying adaptable locomotion with minimal feedback control. This strategy offers a framework for simplifying the control of soft robots by exploiting the nonlinear mechanics of multistable structures and embodying mechanical intelligence into soft material systems
comment: Manuscript: 13 pages, 5 figures ; Supplementary Information: 9 pages, 9 figures, 4 tables
Unravelling Responsibility for AI
It is widely acknowledged that we need to establish where responsibility lies for the outputs and impacts of AI-enabled systems. This is important to achieve justice and compensation for victims of AI harms, and to inform policy and engineering practice. But without a clear, thorough understanding of what `responsibility' means, deliberations about where responsibility lies will be, at best, unfocused and incomplete and, at worst, misguided. Furthermore, AI-enabled systems exist within a wider ecosystem of actors, decisions, and governance structures, giving rise to complex networks of responsibility relations. To address these issues, this paper presents a conceptual framework of responsibility, accompanied with a graphical notation and general methodology, for visualising these responsibility networks and for tracing different responsibility attributions for AI. Taking the three-part formulation 'Actor A is responsible for Occurrence O,' the framework unravels the concept of responsibility to clarify that there are different possibilities of who is responsible for AI, senses in which they are responsible, and aspects of events they are responsible for. The notation allows these permutations to be represented graphically. The methodology enables users to apply the framework to specific scenarios. The aim is to offer a foundation to support stakeholders from diverse disciplinary backgrounds to discuss and address complex responsibility questions in hypothesised and real-world cases involving AI. The work is illustrated by application to a fictitious scenario of a fatal collision between a crewless, AI-enabled maritime vessel in autonomous mode and a traditional, crewed vessel at sea.
Learning with Less: Optimizing Tactile Sensor Configurations for Dexterous Manipulation
Tactile sensing is critical for learning-based robotic dexterous manipulation, enabling real-time force perception, slip detection, and grip adjustments during interactions. While full-hand sensor arrays provide precise control, their deployment is limited by high costs, complex integration, and significant computational demands. Practical constraints, including limited space and the complexity of the wiring, further restrict the use of the entire sensor. Consequently, optimizing sensor configurations to achieve efficient coverage and good performance using fewer sensors remains a significant and open research challenge.In this work, we investigate the influence of tactile sensor quantity and placement on a robotic hand for dexterous manipulation tasks. Through systematic analysis of various sensor configurations, an optimized layout with only 21 sensors is identified, achieving over 93% of the task success rate relative to full-hand coverage (92 sensors). This configuration reduces the sensor count by 77% and offers a considerable reduction in integration costs, demonstrating a cost-effective yet high-performing tactile sensing strategy. Additionally, we develop a multi-factor regression model to predict task success rate under arbitrary sensor configurations. The model achieves strong generalization, with an average prediction error of 3.12% on unseen manipulation tasks. These results offer a scalable framework for deploying tactile sensing in real-world robotic manipulation systems.
comment: This work has been submitted to the RA-L for possible publication
Distributed formation-enforcing control for UAVs robust to observation noise in relative pose measurements
A technique that allows a Formation-Enforcing Control (FEC) derived from graph rigidity theory to interface with a realistic relative localization system onboard lightweight Unmanned Aerial Vehicles (UAVs) is proposed in this paper. The proposed methodology enables reliable real-world deployment of UAVs in tight formations using relative localization systems burdened by non-negligible sensory noise, which is typically not fully taken into account in FEC algorithms. The proposed solution is based on decomposition of the gradient descent-based FEC command into interpretable elements, and then modifying these individually based on the estimated distribution of sensory noise, such that the resulting action limits the probability of the desired formation. The behavior of the system was analyzed and the practicality of the proposed solution was compared to pure gradient-descent in real-world experiments where it presented significantly better performance in terms of oscillations, deviation from the desired state and convergence time.
comment: Submitted to Robotics and Autonomous Systems journal on May 10. 2025
Multiagent Systems
RAI: Flexible Agent Framework for Embodied AI
With an increase in the capabilities of generative language models, a growing interest in embodied AI has followed. This contribution introduces RAI - a framework for creating embodied Multi Agent Systems for robotics. The proposed framework implements tools for Agents' integration with robotic stacks, Large Language Models, and simulations. It provides out-of-the-box integration with state-of-the-art systems like ROS 2. It also comes with dedicated mechanisms for the embodiment of Agents. These mechanisms have been tested on a physical robot, Husarion ROSBot XL, which was coupled with its digital twin, for rapid prototyping. Furthermore, these mechanisms have been deployed in two simulations: (1) robot arm manipulator and (2) tractor controller. All of these deployments have been evaluated in terms of their control capabilities, effectiveness of embodiment, and perception ability. The proposed framework has been used successfully to build systems with multiple agents. It has demonstrated effectiveness in all the aforementioned tasks. It also enabled identifying and addressing the shortcomings of the generative models used for embodied AI.
comment: 12 pages, 8 figures, submitted to 23rd International Conference on Practical applications of Agents and Multi-Agent Systems (PAAMS'25)
The Complexity of Pure Strategy Relevant Equilibria in Concurrent Games
We study rational synthesis problems for concurrent games with $\omega$-regular objectives. Our model of rationality considers only pure strategy Nash equilibria that satisfy either a social welfare or Pareto optimality condition with respect to an $\omega$-regular objective for each agent. This extends earlier work on equilibria in concurrent games, without consideration about their quality. Our results show that the existence of Nash equilibria satisfying social welfare conditions can be computed as efficiently as the constrained Nash equilibrium existence problem. On the other hand, the existence of Nash equilibria satisfying the Pareto optimality condition possibly involves a higher upper bound, except in the case of B\"uchi and Muller games, for which all three problems are in the classes P and PSPACE-complete, respectively.
Continuous-Time Control Synthesis for Multiple Quadrotors under Signal Temporal Logic Specifications
Ensuring continuous-time control of multiple quadrotors in constrained environments under signal temporal logic (STL) specifications is challenging due to nonlinear dynamics, safety constraints, and disturbances. This letter proposes a two-stage framework to address this challenge. First, exponentially decaying tracking error bounds are derived with multidimensional geometric control gains obtained via differential evolution. These bounds are less conservative, while the resulting tracking errors exhibit smaller oscillations and improved transient performance. Second, leveraging the time-varying bounds, a mixed-integer convex programming (MICP) formulation generates piecewise B\'ezier reference trajectories that satisfy STL and velocity limits, while ensuring inter-agent safety through convex-hull properties. Simulation results demonstrate that the proposed approach enables formally verifiable multi-agent coordination in constrained environments, with provable tracking guarantees under bounded disturbances.
Hypergraph Coordination Networks with Dynamic Grouping for Multi-Agent Reinforcement Learning
Cooperative multi-agent reinforcement learning faces significant challenges in effectively organizing agent relationships and facilitating information exchange, particularly when agents need to adapt their coordination patterns dynamically. This paper presents a novel framework that integrates dynamic spectral clustering with hypergraph neural networks to enable adaptive group formation and efficient information processing in multi-agent systems. The proposed framework dynamically constructs and updates hypergraph structures through spectral clustering on agents' state histories, enabling higher-order relationships to emerge naturally from agent interactions. The hypergraph structure is enhanced with attention mechanisms for selective information processing, providing an expressive and efficient way to model complex agent relationships. This architecture can be implemented in both value-based and policy-based paradigms through a unified objective combining task performance with structural regularization. Extensive experiments on challenging cooperative tasks demonstrate that our method significantly outperforms state-of-the-art approaches in both sample efficiency and final performance.
Internet of Agents: Fundamentals, Applications, and Challenges
With the rapid proliferation of large language models and vision-language models, AI agents have evolved from isolated, task-specific systems into autonomous, interactive entities capable of perceiving, reasoning, and acting without human intervention. As these agents proliferate across virtual and physical environments, from virtual assistants to embodied robots, the need for a unified, agent-centric infrastructure becomes paramount. In this survey, we introduce the Internet of Agents (IoA) as a foundational framework that enables seamless interconnection, dynamic discovery, and collaborative orchestration among heterogeneous agents at scale. We begin by presenting a general IoA architecture, highlighting its hierarchical organization, distinguishing features relative to the traditional Internet, and emerging applications. Next, we analyze the key operational enablers of IoA, including capability notification and discovery, adaptive communication protocols, dynamic task matching, consensus and conflict-resolution mechanisms, and incentive models. Finally, we identify open research directions toward building resilient and trustworthy IoA ecosystems.
comment: 22 pages,10 figures, 8 tables. Submitted to IEEE TCCN
Land-Coverage Aware Path-Planning for Multi-UAV Swarms in Search and Rescue Scenarios
Unmanned Aerial Vehicles (UAVs) have become vital in search-and-rescue (SAR) missions, with autonomous mission planning improving response times and coverage efficiency. Early approaches primarily used path planning techniques such as A*, potential-fields, or Dijkstra's algorithm, while recent approaches have incorporated meta-heuristic frameworks like genetic algorithms and particle swarm optimization to balance competing objectives such as network connectivity, energy efficiency, and strategic placement of charging stations. However, terrain-aware path planning remains under-explored, despite its critical role in optimizing UAV SAR deployments. To address this gap, we present a computer-vision based terrain-aware mission planner that autonomously extracts and analyzes terrain topology to enhance SAR pre-flight planning. Our framework uses a deep segmentation network fine-tuned on our own collection of landcover datasets to transform satellite imagery into a structured, grid-based representation of the operational area. This classification enables terrain-specific UAV-task allocation, improving deployment strategies in complex environments. We address the challenge of irregular terrain partitions, by introducing a two-stage partitioning scheme that first evaluates terrain monotonicity along coordinate axes before applying a cost-based recursive partitioning process, minimizing unnecessary splits and optimizing path efficiency. Empirical validation in a high-fidelity simulation environment demonstrates that our approach improves search and dispatch time over multiple meta-heuristic techniques and against a competing state-of-the-art method. These results highlight its potential for large-scale SAR operations, where rapid response and efficient UAV coordination are critical.
comment: 8 pages, 4 figures,
Multi-source Plume Tracing via Multi-Agent Reinforcement Learning
Industrial catastrophes like the Bhopal disaster (1984) and the Aliso Canyon gas leak (2015) demonstrate the urgent need for rapid and reliable plume tracing algorithms to protect public health and the environment. Traditional methods, such as gradient-based or biologically inspired approaches, often fail in realistic, turbulent conditions. To address these challenges, we present a Multi-Agent Reinforcement Learning (MARL) algorithm designed for localizing multiple airborne pollution sources using a swarm of small uncrewed aerial systems (sUAS). Our method models the problem as a Partially Observable Markov Game (POMG), employing a Long Short-Term Memory (LSTM)-based Action-specific Double Deep Recurrent Q-Network (ADDRQN) that uses full sequences of historical action-observation pairs, effectively approximating latent states. Unlike prior work, we use a general-purpose simulation environment based on the Gaussian Plume Model (GPM), incorporating realistic elements such as a three-dimensional environment, sensor noise, multiple interacting agents, and multiple plume sources. The incorporation of action histories as part of the inputs further enhances the adaptability of our model in complex, partially observable environments. Extensive simulations show that our algorithm significantly outperforms conventional approaches. Specifically, our model allows agents to explore only 1.29\% of the environment to successfully locate pollution sources.
comment: 13 pages, 7 figures
Static and Repeated Cooperative Games for the Optimization of the AoI in IoT Networks
Wireless sensing and the internet of things (IoT) are nowadays pervasive in 5G and beyond networks, and they are expected to play a crucial role in 6G. However, a centralized optimization of a distributed system is not always possible and cost-efficient. In this paper, we analyze a setting in which two sensors collaboratively update a common server seeking to minimize the age of information (AoI) of the latest sample of a common physical process. We consider a distributed and uncoordinated setting where each sensor lacks information about whether the other decides to update the server. This strategic setting is modeled through game theory (GT) and two games are defined: i) a static game of complete information with an incentive mechanism for cooperation, and ii) a repeated game over a finite horizon where the static game is played at each stage. We perform a mathematical analysis of the static game finding three Nash Equilibria (NEs) in pure strategies and one in mixed strategies. A numerical simulation of the repeated game is also presented and novel and valuable insight into the setting is given thanks to the definition of a new metric, the price of delayed updates (PoDU), which shows that the decentralized solution provides results close to the centralized optimum.
comment: Accepted to MedComNet 2025 (Cagliari, June 2025). 6 pages, 7 figures
Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation
In real-world multi-agent systems (MASs), observation delays are ubiquitous, preventing agents from making decisions based on the environment's true state. An individual agent's local observation often consists of multiple components from other agents or dynamic entities in the environment. These discrete observation components with varying delay characteristics pose significant challenges for multi-agent reinforcement learning (MARL). In this paper, we first formulate the decentralized stochastic individual delay partially observable Markov decision process (DSID-POMDP) by extending the standard Dec-POMDP. We then propose the Rainbow Delay Compensation (RDC), a MARL training framework for addressing stochastic individual delays, along with recommended implementations for its constituent modules. We implement the DSID-POMDP's observation generation pattern using standard MARL benchmarks, including MPE and SMAC. Experiments demonstrate that baseline MARL methods suffer severe performance degradation under fixed and unfixed delays. The RDC-enhanced approach mitigates this issue, remarkably achieving ideal delay-free performance in certain delay scenarios while maintaining generalizability. Our work provides a novel perspective on multi-agent delayed observation problems and offers an effective solution framework. The source code is available at https://anonymous.4open.science/r/RDC-pymarl-4512/.
comment: The code will be open-sourced in the RDC-pymarl project under https://github.com/linkjoker1006
Integrating Expert Knowledge into Logical Programs via LLMs
This paper introduces ExKLoP, a novel framework designed to evaluate how effectively Large Language Models (LLMs) integrate expert knowledge into logical reasoning systems. This capability is especially valuable in engineering, where expert knowledge-such as manufacturer-recommended operational ranges-can be directly embedded into automated monitoring systems. By mirroring expert verification steps, tasks like range checking and constraint validation help ensure system safety and reliability. Our approach systematically evaluates LLM-generated logical rules, assessing both syntactic fluency and logical correctness in these critical validation tasks. We also explore the models' capacity for self-correction via an iterative feedback loop based on code execution outcomes. ExKLoP presents an extensible dataset comprising 130 engineering premises, 950 prompts, and corresponding validation points. It enables comprehensive benchmarking while allowing control over task complexity and scalability of experiments. We leverage the synthetic data creation methodology to conduct extensive empirical evaluation on a diverse set of LLMs including Llama3, Gemma3, Codestral and QwenCoder. The results reveal that most models generate nearly perfect syntactically correct code and exhibit strong performance in translating expert knowledge into correct code. At the same time, while most LLMs produce nearly flawless syntactic output, their ability to correctly implement logical rules varies, as does their capacity for self-improvement. Overall, ExKLoP serves as a robust evaluation platform that streamlines the selection of effective models for self-correcting systems while clearly delineating the types of errors encountered.
A Multi-Agent Rollout Approach for Highway Bottleneck Decongestion in Mixed Autonomy SC
The integration of autonomous vehicles (AVs) into the existing transportation infrastructure offers a promising solution to alleviate congestion and enhance mobility. This research explores a novel approach to traffic optimization by employing a multi-agent rollout approach within a mixed autonomy environment. The study concentrates on coordinating the speed of human-driven vehicles by longitudinally controlling AVs, aiming to dynamically optimize traffic flow and alleviate congestion at highway bottlenecks in real-time. We model the problem as a decentralized partially observable Markov decision process (Dec-POMDP) and propose an improved multi-agent rollout algorithm. By employing agent-by-agent policy iterations, our approach implicitly considers cooperation among multiple agents and seamlessly adapts to complex scenarios where the number of agents dynamically varies. Validated in a real-world network with varying AV penetration rates and traffic flow, the simulations demonstrate that the multi-agent rollout algorithm significantly enhances performance, reducing average travel time on bottleneck segments by 9.42% with a 10% AV penetration rate.
comment: Accepted by the 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)
Beyond the Tragedy of the Commons: Building A Reputation System for Generative Multi-agent Systems
The tragedy of the commons, where individual self-interest leads to collectively disastrous outcomes, is a pervasive challenge in human society. Recent studies have demonstrated that similar phenomena can arise in generative multi-agent systems (MASs). To address this challenge, this paper explores the use of reputation systems as a remedy. We propose RepuNet, a dynamic, dual-level reputation framework that models both agent-level reputation dynamics and system-level network evolution. Specifically, driven by direct interactions and indirect gossip, agents form reputations for both themselves and their peers, and decide whether to connect or disconnect other agents for future interactions. Through two distinct scenarios, we show that RepuNet effectively mitigates the 'tragedy of the commons', promoting and sustaining cooperation in generative MASs. Moreover, we find that reputation systems can give rise to rich emergent behaviors in generative MASs, such as the formation of cooperative clusters, the social isolation of exploitative agents, and the preference for sharing positive gossip rather than negative ones.
Exploring Generative AI Techniques in Government: A Case Study
The swift progress of Generative Artificial intelligence (GenAI), notably Large Language Models (LLMs), is reshaping the digital landscape. Recognizing this transformative potential, the National Research Council of Canada (NRC) launched a pilot initiative to explore the integration of GenAI techniques into its daily operation for performance excellence, where 22 projects were launched in May 2024. Within these projects, this paper presents the development of the intelligent agent Pubbie as a case study, targeting the automation of performance measurement, data management and insight reporting at the NRC. Cutting-edge techniques are explored, including LLM orchestration and semantic embedding via RoBERTa, while strategic fine-tuning and few-shot learning approaches are incorporated to infuse domain knowledge at an affordable cost. The user-friendly interface of Pubbie allows general government users to input queries in natural language and easily upload or download files with a simple button click, greatly reducing manual efforts and accessibility barriers.
comment: In submission to IEEE Intelligent Systems
LLM Multi-Agent Systems: Challenges and Open Problems
This paper explores multi-agent systems and identify challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents, multi-agent systems can tackle complex tasks through agent collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layered context information, and enhancing memory management to support the intricate interactions within multi-agent systems. We also explore potential applications of multi-agent systems in blockchain systems to shed light on their future development and application in real-world distributed systems.
Systems and Control (CS)
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data? In DexWild, a diverse team of data collectors uses their hands to collect hours of interactions across a multitude of environments and objects. To record this data, we create DexWild-System, a low-cost, mobile, and easy-to-use device. The DexWild learning framework co-trains on both human and robot demonstrations, leading to improved performance compared to training on each dataset individually. This combination results in robust robot policies capable of generalizing to novel environments, tasks, and embodiments with minimal additional robot-specific data. Experimental results demonstrate that DexWild significantly improves performance, achieving a 68.5% success rate in unseen environments-nearly four times higher than policies trained with robot data only-and offering 5.8x better cross-embodiment generalization. Video results, codebases, and instructions at https://dexwild.github.io
comment: In RSS 2025. Website at https://dexwild.github.io
Non-Conservative Data-driven Safe Control Design for Nonlinear Systems with Polyhedral Safe Sets
This paper presents a data-driven nonlinear safe control design approach for discrete-time systems under parametric uncertainties and additive disturbances. We first characterize a new control structure from which a data-based representation of closed-loop systems is obtained. This data-based closed-loop system is composed of two parts: 1) a parametrized linear closed-loop part and a parametrized nonlinear remainder closed-loop part. We show that using the standard practice or learning a robust controller to ensure safety while treating the remaining nonlinearities as disturbances brings about significant challenges in terms of computational complexity and conservatism. To overcome these challenges, we develop a novel nonlinear safe control design approach in which the closed-loop nonlinear remainders are learned, rather than canceled, in a control-oriented fashion while preserving the computational efficiency. To this end, a primal-dual optimization framework is leveraged in which the control gains are learned to enforce the second-order optimality on the closed-loop nonlinear remainders. This allows us to account for nonlinearities in the design for the sake of safety rather than treating them as disturbances. This new controller parameterization and design approach reduces the computational complexity and the conservatism of designing a safe nonlinear controller. A simulation example is then provided to show the effectiveness of the proposed data-driven controller.
Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control
Industrial automation increasingly demands energy-efficient control strategies to balance performance with environmental and cost constraints. In this work, we present a multi-objective reinforcement learning (MORL) framework for energy-efficient control of the Quanser Aero 2 testbed in its one-degree-of-freedom configuration. We design a composite reward function that simultaneously penalizes tracking error and electrical power consumption. Preliminary experiments explore the influence of varying the Energy penalty weight, alpha, on the trade-off between pitch tracking and energy savings. Our results reveal a marked performance shift for alpha values between 0.0 and 0.25, with non-Pareto optimal solutions emerging at lower alpha values, on both the simulation and the real system. We hypothesize that these effects may be attributed to artifacts introduced by the adaptive behavior of the Adam optimizer, which could bias the learning process and favor bang-bang control strategies. Future work will focus on automating alpha selection through Gaussian Process-based Pareto front modeling and transitioning the approach from simulation to real-world deployment.
comment: Accepted at DEXA 2025 (AI4IP)
Finite-Sample-Based Reachability for Safe Control with Gaussian Process Dynamics
Gaussian Process (GP) regression is shown to be effective for learning unknown dynamics, enabling efficient and safety-aware control strategies across diverse applications. However, existing GP-based model predictive control (GP-MPC) methods either rely on approximations, thus lacking guarantees, or are overly conservative, which limits their practical utility. To close this gap, we present a sampling-based framework that efficiently propagates the model's epistemic uncertainty while avoiding conservatism. We establish a novel sample complexity result that enables the construction of a reachable set using a finite number of dynamics functions sampled from the GP posterior. Building on this, we design a sampling-based GP-MPC scheme that is recursively feasible and guarantees closed-loop safety and stability with high probability. Finally, we showcase the effectiveness of our method on two numerical examples, highlighting accurate reachable set over-approximation and safe closed-loop performance.
Congestion-Sensitive Grid Aggregation for DC Optimal Power Flow
The vast spatial dimension of modern interconnected electricity grids challenges the tractability of the DC optimal power flow problem. Grid aggregation methods try to overcome this challenge by reducing the number of network elements. Many existing methods use Locational Marginal Prices as a distance metric to cluster nodes. In this paper, we show that prevalent methods adopting this distance metric fail to adequately capture the impact of individual lines when there is more than one line congested. This leads to suboptimal outcomes for the optimization of the aggregated model. To overcome those issues, we propose two methods based on the novel Network Congestion Price metric, which preserves the impact of nodal power injections on individual line congestions. The proposed methods are compared to several existing aggregation methods based on Locational Marginal Prices. We demonstrate all methods on adapted versions of the IEEE RTS 24- and 300-Bus systems. We show that the proposed methods outperform existing approaches both in terms of objective function value error and maximum line limit violation, while exhibiting faster node clustering. We conclude that aggregation methods based on the novel Network Congestion Price metric are better at preserving the essential physical characteristics of the network topology in the grid aggregation process than methods based on Locational Marginal Prices.
comment: Accepted for IEEE PES PowerTech 2025
Integrated Localization and Path Planning for an Ocean Exploring Team of Autonomous Underwater Vehicles with Consensus Graph Model Predictive Control
Navigation of a team of autonomous underwater vehicles (AUVs) coordinated by an unmanned surface vehicle (USV) is efficient and reliable for deep ocean exploration. AUVs depart from and return to the USV after collaborative navigation, data collection, and ocean exploration missions. Efficient path planning and accurate localization are essential, the latter of which is critical due to the lack of global localization signals and poor radio frequency (RF) communication in deep waters. Inertial navigation and acoustic communication are common solutions for localization. However, the former is subject to odometry drifts, and the latter is limited to short distances. This paper proposes a systematic approach for localization-aware energy-efficient collision-free path planning for a USV-AUVs team. Path planning is formulated as finite receding horizon model predictive control (MPC) optimization. A dynamic-aware linear kinodynamic motion equation is developed. The mathematical formulation for the MPC optimization is effectively developed where localization is integrated as consensus graph optimization among AUV nodes. Edges in the optimized AUV-to-USV (A2U) and AUV-to-AUV (A2A) graphs are constrained to the sonar range of acoustic modems. The time complexity of the consensus MPC optimization problem is analyzed, revealing a nonconvex NP-hard problem, which is solved using sequential convex programming. Numerical simulation results are provided to evaluate the proposed method.
Optimal Low Emission Zones scheduling as an example of transport policy backcasting
This study presents a backcasting approach that considers the passenger car fleet dynamics to determine optimal policy roadmaps in transport systems. As opposed to the scenario-based approach, backcasting sets emission reduction targets first, then identifies policies that meet the constraint. The policy is the implementation of Low Emission Zones (LEZs), in the Ile-de-France region as a case study. The aim is to minimize the number of scrapped vehicles due to LEZs under CO2 emission targets and to deduce an interdiction schedule of polluting vehicles by 2050. To explore potential solutions, we use a genetic algorithm that provides a first insight into optimal policy pathways.
Energy personas in Danish households
Technologies to monitor the provision of renewable energy are part of emerging technologies to help address the discrepancy between renewable energy production and its related usage in households. This paper presents various ways householders use a technological artifact for the real-time monitoring of renewable energy provision. Such a monitoring thus affords householders with an opportunity to adjust their energy consumption according to renewable energy provision. In Denmark, Ewii, previously Barry, is a Danish energy supplier which provides householders with an opportunity to monitor energy sources in real time through a technological solution of the same name. This paper use provision afforded by Ewii as a case for exploring how householders organize themselves to use a technological artefact that supports the monitoring of energy and its related usage. This study aims to inform technology design through the derivation of four personas. The derived personas highlight the differences in energy monitoring practices for the householders and their engagement. These personas are characterised as dedicated, organised, sporadic, and convenient. Understanding these differences in energy monitoring practice using the technological artefact form a solid element in the design of future energy technologies that interfere with the everyday practices and energy consumption for households. This is paramount for future energy related technology design, and for the clarification of usage assumptions that are embedded in the rollout of energy related technology as a country such as Denmark moves through its green transition.
Towards Autonomous 1/8th Offroad RC Racing -- The TruggySense Educational Platform
This paper presents a state-of-the-art Data Acquisition System designed for off-road conditions, deployed on a Team Corally Kagama 1/8 Remote Controlled Vehicle. The system is intended to support Advanced Driver Assistance Systems in an educational context by providing valuable, consistent, and representative data. Key measurement systems are discussed to enable insights into the Remote Controlled Vehicles stability during and after off-road races. Furthermore, four experiments where conducted to evaluate the Data Acquisition Systems accuracy, stability, and consistency in replicating real-world vehicle behavior. The proposed Data Acquisition System platform serves as a solid foundation for use in engineering education, enabling integration with various Advanced Driver Assistance Systems algorithms to enhance vehicle control and overall performance, offering a new dimension to off-road racing. Additionally, realtime telemetry enables verification and validation of Advanced Driver Assistance Systems algorithms based on the live operating state of the Radio Controlled Vehicle during races
Anti-windup design for internal model online constrained optimization
This paper proposes a novel algorithmic design procedure for online constrained optimization grounded in control-theoretic principles. By integrating the Internal Model Principle (IMP) with an anti-windup compensation mechanism, the proposed Projected-Internal Model Anti-Windup (P-IMAW) gradient descent exploits a partial knowledge of the temporal evolution of the cost function to enhance tracking performance. The algorithm is developed through a structured synthesis procedure: first, a robust controller leveraging the IMP ensures asymptotic convergence in the unconstrained setting. Second, an anti-windup augmentation guarantees stability and performance in the presence of the projection operator needed to satisfy the constraints. The effectiveness of the proposed approach is demonstrated through numerical simulations comparing it against other classical techniques.
comment: 8 pages, 8 figures, submitted to IEEE CDC 2025
High Performance Signal Design for ACO-OFDM Systems using Variational Autoencoder
This letter proposes a design of low peak-to-average power ratio (PAPR), low symbol error rate (SER), and high data rate signal for asymmetrically clipped optical orthogonal frequency division multiplexing (ACO-OFDM) systems. The proposed design leverages a variational autoencoder (VAE) incorporating gradual loss learning to jointly optimize the geometry and probability of the constellation's symbols. This not only enhances mutual information (MI) but also effectively reduces the PAPR while maintaining a low SER for reliable transmission. We evaluate the performance of the proposed VAE-based design by comparing the MI, SER, and PAPR against existing techniques. Simulation results demonstrate that the proposed method achieves a considerably lower PAPR while maintaining superior SER and MI performance for a wide range of SNRs.
Stiffness-based Analytic Centre Method for Cable-Driven Parallel Robots
Nowadays, being fast and precise are key requirements in Robotics. This work introduces a novel methodology to tune the stiffness of Cable-Driven Parallel Robots (CDPRs) while simultaneously addressing the tension distribution problem. In particular, the approach relies on the Analytic-Centre method. Indeed, weighting the barrier functions makes natural the stiffness adaptation. The intrinsic ability to adjust the stiffness during the execution of the task enables the CDPRs to effectively meet above-mentioned requirements. The capabilities of the method are demonstrated through simulations by comparing it with the existing approach.
comment: Conference paper
Interpretable Event Diagnosis in Water Distribution Networks
The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events. In this work, we propose a framework for interpretable event diagnosis -- an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm's inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose counterfactual event fingerprints, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.
HuB: Learning Extreme Humanoid Balance
The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1.5 meters-both requiring precise balance control. While recent research on humanoid control has leveraged reinforcement learning to track human motions for skill acquisition, applying this paradigm to balance-intensive tasks remains challenging. In this work, we identify three key obstacles: instability from reference motion errors, learning difficulties due to morphological mismatch, and the sim-to-real gap caused by sensor noise and unmodeled dynamics. To address these challenges, we propose HuB (Humanoid Balance), a unified framework that integrates reference motion refinement, balance-aware policy learning, and sim-to-real robustness training, with each component targeting a specific challenge. We validate our approach on the Unitree G1 humanoid robot across challenging quasi-static balance tasks, including extreme single-legged poses such as Swallow Balance and Bruce Lee's Kick. Our policy remains stable even under strong physical disturbances-such as a forceful soccer strike-while baseline methods consistently fail to complete these tasks. Project website: https://hub-robot.github.io
comment: Project website: https://hub-robot.github.io
Learning Quasi-LPV Models and Robust Control Invariant Sets with Reduced Conservativeness
We present an approach to identify a quasi Linear Parameter Varying (qLPV) model of a plant, with the qLPV model guaranteed to admit a robust control invariant (RCI) set. It builds upon the concurrent synthesis framework presented in [1], in which the requirement of existence of an RCI set is modeled as a control-oriented regularization. Here, we reduce the conservativeness of the approach by bounding the qLPV system with an uncertain LTI system, which we derive using bound propagation approaches. The resulting regularization function is the optimal value of a nonlinear robust optimization problem that we solve via a differentiable algorithm. We numerically demonstrate the benefits of the proposed approach over two benchmark approaches.
comment: 12 pages, 3 figures
Continuous-Time Control Synthesis for Multiple Quadrotors under Signal Temporal Logic Specifications
Ensuring continuous-time control of multiple quadrotors in constrained environments under signal temporal logic (STL) specifications is challenging due to nonlinear dynamics, safety constraints, and disturbances. This letter proposes a two-stage framework to address this challenge. First, exponentially decaying tracking error bounds are derived with multidimensional geometric control gains obtained via differential evolution. These bounds are less conservative, while the resulting tracking errors exhibit smaller oscillations and improved transient performance. Second, leveraging the time-varying bounds, a mixed-integer convex programming (MICP) formulation generates piecewise B\'ezier reference trajectories that satisfy STL and velocity limits, while ensuring inter-agent safety through convex-hull properties. Simulation results demonstrate that the proposed approach enables formally verifiable multi-agent coordination in constrained environments, with provable tracking guarantees under bounded disturbances.
A Novel Online Pseudospectral Method for Approximation of Nonlinear Systems Dynamics
This paper presents a novel online system identification approach utilizing the Chebyshev pseudospectral (PS) method to approximate the dynamics of a continuous-time nonlinear system. Unlike conventional periodic sampling, the proposed identification scheme employs aperiodic state sampling, leveraging the Chebyshev nodes to achieve the desired approximation accuracy. Unlike traditional off-line PS approaches, the scheme utilizes a moving time-window strategy to compute the sampling instants (Chebyshev nodes) forward in time for state measurements. Within each time window, the number of sampling instants is adaptively determined to meet the specified approximation accuracy. The Chebyshev basis is also shifted for each time window to accommodate arbitrary approximation intervals. The least-squares approach is employed to estimate the coefficients of the shifted Chebyshev basis functions using the measured state and its derivatives at the end of each window, resulting in a piecewise approximation of the drift dynamics of the system. In the second step, the identified drift dynamics are utilized to design an adaptive state estimator to reconstruct the continuous system states from the aperiodic state measurements. To address the smoothness of the piecewise approximated system dynamics, the Chebyshev coefficients are recomputed at the window transition instants, between two consecutive time windows, by enforcing continuity of the approximated function and its derivatives. In addition, analytical results are provided to determine the number of sampling instants within each time window, which guarantees the desired approximation accuracy. The boundedness of function approximation and state estimation errors is also proved analytically. Finally, numerical simulation results are included to validate the proposed scheme.
comment: 12 pages, 4 figures, Substantially extends our earlier paper accepted for ACC 2025 (Denver, 8 Jul 2025). Submitted to IEEE Transactions on Automatic Control; under review
Economic data-enabled predictive control using machine learning
In this paper, we propose a convex data-based economic predictive control method within the framework of data-enabled predictive control (DeePC). Specifically, we use a neural network to transform the system output into a new state space, where the nonlinear economic cost function of the underlying nonlinear system is approximated using a quadratic function expressed by the transformed output in the new state space. Both the neural network parameters and the coefficients of the quadratic function are learned from open-loop data of the system. Additionally, we reconstruct constrained output variables from the transformed output through learning an output reconstruction matrix; this way, the proposed economic DeePC can handle output constraints explicitly. The performance of the proposed method is evaluated via a case study in a simulated chemical process.
comment: 6 pages, 2 figures
Leveraging Reinforcement Learning and Koopman Theory for Enhanced Model Predictive Control Performance
This study presents an innovative approach to Model Predictive Control (MPC) by leveraging the powerful combination of Koopman theory and Deep Reinforcement Learning (DRL). By transforming nonlinear dynamical systems into a higher-dimensional linear regime, the Koopman operator facilitates the linear treatment of nonlinear behaviors, paving the way for more efficient control strategies. Our methodology harnesses the predictive prowess of Koopman-based models alongside the optimization capabilities of DRL, particularly using the Proximal Policy Optimization (PPO) algorithm, to enhance the controller's performance. The resulting end-to-end learning framework refines the predictive control policies to cater to specific operational tasks, optimizing both performance and economic efficiency. We validate our approach through rigorous NMPC and eNMPC case studies, demonstrating that the Koopman-RL controller outperforms traditional controllers by achieving higher stability, superior constraint satisfaction, and significant cost savings. The findings indicate that our model can be a robust tool for complex control tasks, offering valuable insights into future applications of RL in MPC.
comment: 21 pages, 8 figures, 2 author photos. Affiliations: Electrical Engineering Department, Penn State University. Keywords: Economic Nonlinear Model Predictive Control, (e)NMPC, Reinforcement Learning, Surrogate Models, System Identification
Safety and optimality in learning-based control at low computational cost
Applying machine learning methods to physical systems that are supposed to act in the real world requires providing safety guarantees. However, methods that include such guarantees often come at a high computational cost, making them inapplicable to large datasets and embedded devices with low computational power. In this paper, we propose CoLSafe, a computationally lightweight safe learning algorithm whose computational complexity grows sublinearly with the number of data points. We derive both safety and optimality guarantees and showcase the effectiveness of our algorithm on a seven-degrees-of-freedom robot arm.
comment: Accepted final version to appear in the IEEE Transactions on Automatic Control
VoI-Driven Joint Optimization of Control and Communication in Vehicular Digital Twin Network
The vision of sixth-generation (6G) wireless networks paves the way for the seamless integration of digital twins into vehicular networks, giving rise to a Vehicular Digital Twin Network (VDTN). The large amount of computing resources as well as the massive amount of spatial-temporal data in Digital Twin (DT) domain can be utilized to enhance the communication and control performance of Internet of Vehicle (IoV) systems. In this article, we first propose the architecture of VDTN, emphasizing key modules that center on functions related to the joint optimization of control and communication. We then delve into the intricacies of the multitimescale decision process inherent in joint optimization in VDTN, specifically investigating the dynamic interplay between control and communication. To facilitate the joint optimization, we define two Value of Information (VoI) concepts rooted in control performance. Subsequently, utilizing VoI as a bridge between control and communication, we introduce a novel joint optimization framework, which involves iterative processing of two Deep Reinforcement Learning (DRL) modules corresponding to control and communication to derive the optimal policy. Finally, we conduct simulations of the proposed framework applied to a platoon scenario to demonstrate its effectiveness in ensu
Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach
In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms to solve this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, under additional assumptions, we establish linear convergence rates and derive significantly lower communication and computational complexity bounds than those of existing algorithms. Several numerical experiments validate our theoretical analysis and demonstrate the practical superiority of the proposed algorithms.
Distributed Coordination of Grid-Forming and Grid-Following Inverters for Optimal Frequency Control in Power Systems
The large-scale integration of inverter-interfaced renewable energy sources presents significant challenges to maintaining power balance and nominal frequency in modern power systems. This paper studies grid-level coordinated control of grid-forming (GFM) and grid-following (GFL) inverter-based resources (IBRs) for scalable and optimal frequency control. We propose a fully distributed optimal frequency control algorithm based on the projected primal-dual gradient method and by leveraging the structure of the underlying physical system dynamics. The proposed algorithm i) restores the nominal system frequency while minimizing total control cost and enforcing IBR power capacity limits and line thermal constraints, and ii) operates in a distributed manner that only needs local measurements and neighbor-to-neighbor communication. In particular, when the line thermal constraints are disregarded, the proposed algorithm admits a fully local implementation that requires no communication, while still ensuring optimality and satisfying IBR power capacity limits. We establish the global asymptotic convergence of the algorithm using Lyapunov stability analysis. The effectiveness and optimality of the proposed algorithms are validated through high-fidelity, 100% inverter-based electromagnetic transient (EMT) simulations on the IEEE 39-bus system.
Energetic Resilience of Linear Driftless Systems
When a malfunction causes a control system to lose authority over a subset of its actuators, achieving a task may require spending additional energy in order to compensate for the effect of uncontrolled inputs. To understand this increase in energy, we introduce an energetic resilience metric that quantifies the maximal additional energy required to achieve finite-time regulation in linear driftless systems that suffer this malfunction. We first derive optimal control signals and minimum energies to achieve this task in both the nominal and malfunctioning systems. We then obtain a bound on the worst-case energy used by the malfunctioning system, and its exact expression in the special case of loss of authority over one actuator. Further considering this special case, we derive a bound on the metric for energetic resilience. A simulation example on a model of an underwater robot demonstrates that this bound is useful in quantifying the increased energy used by a system suffering such a malfunction.
comment: 6 pages, 1 figure
Sparse identification of nonlinear dynamics with high accuracy and reliability under noisy conditions for applications to industrial systems
This paper proposes a sparse identification of nonlinear dynamics (SINDy) with control and exogenous inputs for highly accurate and reliable prediction. The method is applied to the diesel engine airpath systems, which are known as a nonlinear complicated industrial system. Although SINDy is recognized as a remarkable approach for identifying nonlinear systems, several challenges remain. Its application to industrial systems remains limited, and multi-step predictions are not guaranteed due to overfitting and noisy data. This phenomenon is often caused by the increase in basis functions resulting from the extension of coordinates, such as time-delay embedding. To address these problems, we propose an emphasized SINDy by incorporating ensemble learning, elite gathering, and classification techniques while keeping convex calculation. The proposed method employs library bagging and extracts elites with an R-squared greater than 90%. Then, clustering is performed on the surviving elites because physically motivated basis functions are not always available, and the elites obtained do not always show the same trends. After the classification, discrete model candidates are obtained by taking the mean of each classified elite. Finally, the best model is selected. The simulation results show that the proposed method realizes multi-step prediction for the airpath system, which is known to be a complicated industrial system under noisy conditions.
Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller
Satellite dynamics in unknown environments are inherently uncertain due to factors such as varying gravitational fields, atmospheric drag, and unpredictable interactions with space debris or other celestial bodies. Traditional sliding mode controllers with fixed parameters often struggle to maintain optimal performance under these fluctuating conditions. Therefore, an adaptive controller is essential to address these challenges by continuously tuning its gains in real-time. In this paper, we have tuned the slopes of the Fixed-time Sliding surface adaptively using reinforcement learning for coupled rendezvous and docking maneuver of chaser satellite with the target satellite in an unknown space environment. The neural network model is used to determine the optimal gains of reaching law of the fixed-time sliding surface. We have assumed that we don't have an accurate model of the system so we have added noise in the tangent space instead of directly on the manifold to preserve the geometric structure of the system while ensuring mathematically consistent uncertainty propagation. The reinforcement learning is used as an approximator to represent the value function of the agent to estimate the dynamical model of the system using the Actor-Critic method. The proposed control algorithm integrates a neural network and a sliding mode controller in a cascade loop architecture, where the tracking error dynamically tunes the sliding surface gains. Global fixed-time stability of the closed-loop feedback system is proved within the Lyapunov framework. This comprehensive approach of fixed-time sliding mode controller using a Reinforcement Learning based ensures the completion of the mission efficiently while addressing the critical challenges posed by the uncertain environment. The simulation results presented support the claims made.
Analysis of Power Losses and the Efficacy of Power Minimization Strategies in Multichannel Electrical Stimulation Systems
Neuroprosthetic devices require multichannel stimulator systems with an increasing number of channels. However, there are inherent power losses in typical multichannel stimulation circuits caused by a mismatch between the power supply voltage and the voltage required at each electrode to successfully stimulate tissue. This imposes a bottleneck towards high-channel-count devices, which is particularly severe in wirelessly-powered devices. Hence, advances in the power efficiency of stimulation systems are critical. To support these advances, this paper presents a methodology to identify and quantify power losses associated with different power supply scaling strategies in multichannel stimulation systems. The proposed methodology utilizes distributions of stimulation amplitudes and electrode impedances to calculate power losses in multichannel systems. Experimental data from previously published studies spanning various stimulation applications were analyzed to evaluate the performance of fixed, global, and stepped supply scaling methods, focusing on their impact on power dissipation and efficiency. Variability in output conditions results in low power efficiency in multichannel stimulation systems across all applications. Stepped voltage scaling demonstrated substantial efficiency improvements, achieving an increase of 67 % to 146 %, particularly in high-channel-count applications with significant variability in tissue impedance. Global scaling, by contrast, was more advantageous for systems with fewer channels. The findings highlight the importance of tailoring power management strategies to specific applications to optimize efficiency while minimizing system complexity. The proposed methodology offers a framework for evaluating efficiency-complexity trade-offs, advancing the design of scalable neurostimulation systems.
comment: 24 pages, 11 figures
Modeling False Data Injection Attacks in Integrated Electricity-Gas Systems
This work studies the modeling of false data injection attacks (FDIAs) in integrated electricity-gas systems (IEGSs). First, we introduce a static state estimation model and bad data detection method for IEGSs. Then, we develop FDIAs on IEGSs with complete network topology and parameter information. Next, we develop FDIAs on IEGSs when intruders have only local network topology and parameter information of an IEGS. Lastly, we explore FDIAs on IEGSs when intruders have only local network topology information of an IEGS.
comment: 13 pages
Learning to Drive Anywhere with Model-Based Reannotation
Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.
comment: 19 pages, 11 figures, 8 tables
Iterated Invariant Extended Kalman Filter (IterIEKF)
We study the mathematical properties of the Invariant Extended Kalman Filter (IEKF) when iterating on the measurement update step, following the principles of the well-known Iterated Extended Kalman Filter. This iterative variant of the IEKF (IterIEKF) systematically improves its accuracy through Gauss-Newton-based relinearization, and exhibits additional theoretical properties, particularly in the low-noise regime, that resemble those of the linear Kalman filter. We apply the proposed approach to the problem of estimating the extended pose of a crane payload using an inertial measurement unit. Our results suggest that the IterIEKF significantly outperforms the IEKF when measurements are highly accurate.
Exploring Generative AI Techniques in Government: A Case Study
The swift progress of Generative Artificial intelligence (GenAI), notably Large Language Models (LLMs), is reshaping the digital landscape. Recognizing this transformative potential, the National Research Council of Canada (NRC) launched a pilot initiative to explore the integration of GenAI techniques into its daily operation for performance excellence, where 22 projects were launched in May 2024. Within these projects, this paper presents the development of the intelligent agent Pubbie as a case study, targeting the automation of performance measurement, data management and insight reporting at the NRC. Cutting-edge techniques are explored, including LLM orchestration and semantic embedding via RoBERTa, while strategic fine-tuning and few-shot learning approaches are incorporated to infuse domain knowledge at an affordable cost. The user-friendly interface of Pubbie allows general government users to input queries in natural language and easily upload or download files with a simple button click, greatly reducing manual efforts and accessibility barriers.
comment: In submission to IEEE Intelligent Systems
Uncertainty-aware data-driven predictive control in a stochastic setting
Data-Driven Predictive Control (DDPC) has been recently proposed as an effective alternative to traditional Model Predictive Control (MPC), in that the same constrained optimization problem can be addressed without the need to explicitly identify a full model of the plant. However, DDPC is built upon input/output trajectories. Therefore, the finite sample effect of stochastic data, due to, e.g., measurement noise, may have a detrimental impact on closed-loop performance. Exploiting a formal statistical analysis of the prediction error, in this paper we propose the first systematic approach to deal with uncertainty due to finite sample effects. To this end, we introduce two regularization strategies for which, differently from existing regularization-based DDPC techniques, we propose a tuning rationale allowing us to select the regularization hyper-parameters before closing the loop and without additional experiments. Simulation results confirm the potential of the proposed strategy when closing the loop.
comment: 6 pages, 1 figure, this work has been submitted and accepted for publication at the IFAC World Congress 2023, Yokohama, Japan
Consensus Building in Human-robot Co-learning via Bias Controlled Nonlinear Opinion Dynamics and Non-verbal Communication through Robotic Eyes
Consensus between humans and robots is crucial as robotic agents become more prevalent and deeply integrated into our daily lives. This integration presents both unprecedented opportunities and notable challenges for effective collaboration. However, the active guidance of human actions and their integration in co-learning processes, where humans and robots mutually learn from each other, remains under-explored. This article demonstrates how consensus between human and robot opinions can be established by modeling decision-making processes as non-linear opinion dynamics. We utilize dynamic bias as a control parameter to steer the robot's opinion toward consensus and employ visual cues via a robotic eye gaze to guide human decisions. These non-verbal cues communicate the robot's future intentions, gradually guiding human decisions to align with them. To design robot behavior for consensus, we integrate a human opinion observation algorithm with the robot's opinion formation, controlling its actions based on that formed opinion. Experiments with $51$ participants ($N=51$) in a two-choice decision-making task show that effective consensus and trust can be established in a human--robot co-learning setting by guiding human decisions through nonverbal robotic cues and using bias-controlled opinion dynamics to shape robot behavior. Finally, we provide detailed information on the perceived cognitive load and the behavior of robotic eyes based on user feedback and post-experiment interviews.
comment: 48 pages, 8 figures
A Prudent Framework for Understanding Risk-Awareness in Demand Response
We show that risk-aware behaviors in demand response originate from superquadratic state-dependent cost functions and price uncertainty with skewed distributions. We obtain such results through developing a novel theoretical demand response framework that combines non-anticipatory multi-stage decision-making with superquadratic cost functions. We introduce the concept of prudent demand, defined by a positive third-order derivative of the cost function, which is the first principle for risk-averse behavior despite a risk-neutral objective. Our analysis establishes that future price uncertainty affects immediate consumption decisions, and the extent of this response scales proportionally with the skewness of the price distribution. We visualize our theoretical findings through numerical simulations and illustrate their practical implications using a real-world case study.
The Pump Scheduling Problem: A Real-World Scenario for Reinforcement Learning
Deep Reinforcement Learning (DRL) has demonstrated impressive results in domains such as games and robotics, where task formulations are well-defined. However, few DRL benchmarks are grounded in complex, real-world environments, where safety constraints, partial observability, and the need for hand-engineered task representations pose significant challenges. To help bridge this gap, we introduce a testbed based on the pump scheduling problem in a real-world water distribution facility. The task involves controlling pumps to ensure a reliable water supply while minimizing energy consumption and respecting the constraints of the system. Our testbed includes a realistic simulator, three years of high-resolution (1-minute) operational data from human-led control, and a baseline RL task formulation. This testbed supports a wide range of research directions, including offline RL, safe exploration, inverse RL, and multi-objective optimization.
Systems and Control (EESS)
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data? In DexWild, a diverse team of data collectors uses their hands to collect hours of interactions across a multitude of environments and objects. To record this data, we create DexWild-System, a low-cost, mobile, and easy-to-use device. The DexWild learning framework co-trains on both human and robot demonstrations, leading to improved performance compared to training on each dataset individually. This combination results in robust robot policies capable of generalizing to novel environments, tasks, and embodiments with minimal additional robot-specific data. Experimental results demonstrate that DexWild significantly improves performance, achieving a 68.5% success rate in unseen environments-nearly four times higher than policies trained with robot data only-and offering 5.8x better cross-embodiment generalization. Video results, codebases, and instructions at https://dexwild.github.io
comment: In RSS 2025. Website at https://dexwild.github.io
Non-Conservative Data-driven Safe Control Design for Nonlinear Systems with Polyhedral Safe Sets
This paper presents a data-driven nonlinear safe control design approach for discrete-time systems under parametric uncertainties and additive disturbances. We first characterize a new control structure from which a data-based representation of closed-loop systems is obtained. This data-based closed-loop system is composed of two parts: 1) a parametrized linear closed-loop part and a parametrized nonlinear remainder closed-loop part. We show that using the standard practice or learning a robust controller to ensure safety while treating the remaining nonlinearities as disturbances brings about significant challenges in terms of computational complexity and conservatism. To overcome these challenges, we develop a novel nonlinear safe control design approach in which the closed-loop nonlinear remainders are learned, rather than canceled, in a control-oriented fashion while preserving the computational efficiency. To this end, a primal-dual optimization framework is leveraged in which the control gains are learned to enforce the second-order optimality on the closed-loop nonlinear remainders. This allows us to account for nonlinearities in the design for the sake of safety rather than treating them as disturbances. This new controller parameterization and design approach reduces the computational complexity and the conservatism of designing a safe nonlinear controller. A simulation example is then provided to show the effectiveness of the proposed data-driven controller.
Multi-Objective Reinforcement Learning for Energy-Efficient Industrial Control
Industrial automation increasingly demands energy-efficient control strategies to balance performance with environmental and cost constraints. In this work, we present a multi-objective reinforcement learning (MORL) framework for energy-efficient control of the Quanser Aero 2 testbed in its one-degree-of-freedom configuration. We design a composite reward function that simultaneously penalizes tracking error and electrical power consumption. Preliminary experiments explore the influence of varying the Energy penalty weight, alpha, on the trade-off between pitch tracking and energy savings. Our results reveal a marked performance shift for alpha values between 0.0 and 0.25, with non-Pareto optimal solutions emerging at lower alpha values, on both the simulation and the real system. We hypothesize that these effects may be attributed to artifacts introduced by the adaptive behavior of the Adam optimizer, which could bias the learning process and favor bang-bang control strategies. Future work will focus on automating alpha selection through Gaussian Process-based Pareto front modeling and transitioning the approach from simulation to real-world deployment.
comment: Accepted at DEXA 2025 (AI4IP)
Finite-Sample-Based Reachability for Safe Control with Gaussian Process Dynamics
Gaussian Process (GP) regression is shown to be effective for learning unknown dynamics, enabling efficient and safety-aware control strategies across diverse applications. However, existing GP-based model predictive control (GP-MPC) methods either rely on approximations, thus lacking guarantees, or are overly conservative, which limits their practical utility. To close this gap, we present a sampling-based framework that efficiently propagates the model's epistemic uncertainty while avoiding conservatism. We establish a novel sample complexity result that enables the construction of a reachable set using a finite number of dynamics functions sampled from the GP posterior. Building on this, we design a sampling-based GP-MPC scheme that is recursively feasible and guarantees closed-loop safety and stability with high probability. Finally, we showcase the effectiveness of our method on two numerical examples, highlighting accurate reachable set over-approximation and safe closed-loop performance.
Congestion-Sensitive Grid Aggregation for DC Optimal Power Flow
The vast spatial dimension of modern interconnected electricity grids challenges the tractability of the DC optimal power flow problem. Grid aggregation methods try to overcome this challenge by reducing the number of network elements. Many existing methods use Locational Marginal Prices as a distance metric to cluster nodes. In this paper, we show that prevalent methods adopting this distance metric fail to adequately capture the impact of individual lines when there is more than one line congested. This leads to suboptimal outcomes for the optimization of the aggregated model. To overcome those issues, we propose two methods based on the novel Network Congestion Price metric, which preserves the impact of nodal power injections on individual line congestions. The proposed methods are compared to several existing aggregation methods based on Locational Marginal Prices. We demonstrate all methods on adapted versions of the IEEE RTS 24- and 300-Bus systems. We show that the proposed methods outperform existing approaches both in terms of objective function value error and maximum line limit violation, while exhibiting faster node clustering. We conclude that aggregation methods based on the novel Network Congestion Price metric are better at preserving the essential physical characteristics of the network topology in the grid aggregation process than methods based on Locational Marginal Prices.
comment: Accepted for IEEE PES PowerTech 2025
Integrated Localization and Path Planning for an Ocean Exploring Team of Autonomous Underwater Vehicles with Consensus Graph Model Predictive Control
Navigation of a team of autonomous underwater vehicles (AUVs) coordinated by an unmanned surface vehicle (USV) is efficient and reliable for deep ocean exploration. AUVs depart from and return to the USV after collaborative navigation, data collection, and ocean exploration missions. Efficient path planning and accurate localization are essential, the latter of which is critical due to the lack of global localization signals and poor radio frequency (RF) communication in deep waters. Inertial navigation and acoustic communication are common solutions for localization. However, the former is subject to odometry drifts, and the latter is limited to short distances. This paper proposes a systematic approach for localization-aware energy-efficient collision-free path planning for a USV-AUVs team. Path planning is formulated as finite receding horizon model predictive control (MPC) optimization. A dynamic-aware linear kinodynamic motion equation is developed. The mathematical formulation for the MPC optimization is effectively developed where localization is integrated as consensus graph optimization among AUV nodes. Edges in the optimized AUV-to-USV (A2U) and AUV-to-AUV (A2A) graphs are constrained to the sonar range of acoustic modems. The time complexity of the consensus MPC optimization problem is analyzed, revealing a nonconvex NP-hard problem, which is solved using sequential convex programming. Numerical simulation results are provided to evaluate the proposed method.
Optimal Low Emission Zones scheduling as an example of transport policy backcasting
This study presents a backcasting approach that considers the passenger car fleet dynamics to determine optimal policy roadmaps in transport systems. As opposed to the scenario-based approach, backcasting sets emission reduction targets first, then identifies policies that meet the constraint. The policy is the implementation of Low Emission Zones (LEZs), in the Ile-de-France region as a case study. The aim is to minimize the number of scrapped vehicles due to LEZs under CO2 emission targets and to deduce an interdiction schedule of polluting vehicles by 2050. To explore potential solutions, we use a genetic algorithm that provides a first insight into optimal policy pathways.
Energy personas in Danish households
Technologies to monitor the provision of renewable energy are part of emerging technologies to help address the discrepancy between renewable energy production and its related usage in households. This paper presents various ways householders use a technological artifact for the real-time monitoring of renewable energy provision. Such a monitoring thus affords householders with an opportunity to adjust their energy consumption according to renewable energy provision. In Denmark, Ewii, previously Barry, is a Danish energy supplier which provides householders with an opportunity to monitor energy sources in real time through a technological solution of the same name. This paper use provision afforded by Ewii as a case for exploring how householders organize themselves to use a technological artefact that supports the monitoring of energy and its related usage. This study aims to inform technology design through the derivation of four personas. The derived personas highlight the differences in energy monitoring practices for the householders and their engagement. These personas are characterised as dedicated, organised, sporadic, and convenient. Understanding these differences in energy monitoring practice using the technological artefact form a solid element in the design of future energy technologies that interfere with the everyday practices and energy consumption for households. This is paramount for future energy related technology design, and for the clarification of usage assumptions that are embedded in the rollout of energy related technology as a country such as Denmark moves through its green transition.
Towards Autonomous 1/8th Offroad RC Racing -- The TruggySense Educational Platform
This paper presents a state-of-the-art Data Acquisition System designed for off-road conditions, deployed on a Team Corally Kagama 1/8 Remote Controlled Vehicle. The system is intended to support Advanced Driver Assistance Systems in an educational context by providing valuable, consistent, and representative data. Key measurement systems are discussed to enable insights into the Remote Controlled Vehicles stability during and after off-road races. Furthermore, four experiments where conducted to evaluate the Data Acquisition Systems accuracy, stability, and consistency in replicating real-world vehicle behavior. The proposed Data Acquisition System platform serves as a solid foundation for use in engineering education, enabling integration with various Advanced Driver Assistance Systems algorithms to enhance vehicle control and overall performance, offering a new dimension to off-road racing. Additionally, realtime telemetry enables verification and validation of Advanced Driver Assistance Systems algorithms based on the live operating state of the Radio Controlled Vehicle during races
Anti-windup design for internal model online constrained optimization
This paper proposes a novel algorithmic design procedure for online constrained optimization grounded in control-theoretic principles. By integrating the Internal Model Principle (IMP) with an anti-windup compensation mechanism, the proposed Projected-Internal Model Anti-Windup (P-IMAW) gradient descent exploits a partial knowledge of the temporal evolution of the cost function to enhance tracking performance. The algorithm is developed through a structured synthesis procedure: first, a robust controller leveraging the IMP ensures asymptotic convergence in the unconstrained setting. Second, an anti-windup augmentation guarantees stability and performance in the presence of the projection operator needed to satisfy the constraints. The effectiveness of the proposed approach is demonstrated through numerical simulations comparing it against other classical techniques.
comment: 8 pages, 8 figures, submitted to IEEE CDC 2025
High Performance Signal Design for ACO-OFDM Systems using Variational Autoencoder
This letter proposes a design of low peak-to-average power ratio (PAPR), low symbol error rate (SER), and high data rate signal for asymmetrically clipped optical orthogonal frequency division multiplexing (ACO-OFDM) systems. The proposed design leverages a variational autoencoder (VAE) incorporating gradual loss learning to jointly optimize the geometry and probability of the constellation's symbols. This not only enhances mutual information (MI) but also effectively reduces the PAPR while maintaining a low SER for reliable transmission. We evaluate the performance of the proposed VAE-based design by comparing the MI, SER, and PAPR against existing techniques. Simulation results demonstrate that the proposed method achieves a considerably lower PAPR while maintaining superior SER and MI performance for a wide range of SNRs.
Stiffness-based Analytic Centre Method for Cable-Driven Parallel Robots
Nowadays, being fast and precise are key requirements in Robotics. This work introduces a novel methodology to tune the stiffness of Cable-Driven Parallel Robots (CDPRs) while simultaneously addressing the tension distribution problem. In particular, the approach relies on the Analytic-Centre method. Indeed, weighting the barrier functions makes natural the stiffness adaptation. The intrinsic ability to adjust the stiffness during the execution of the task enables the CDPRs to effectively meet above-mentioned requirements. The capabilities of the method are demonstrated through simulations by comparing it with the existing approach.
comment: Conference paper
Interpretable Event Diagnosis in Water Distribution Networks
The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events. In this work, we propose a framework for interpretable event diagnosis -- an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm's inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose counterfactual event fingerprints, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.
HuB: Learning Extreme Humanoid Balance
The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1.5 meters-both requiring precise balance control. While recent research on humanoid control has leveraged reinforcement learning to track human motions for skill acquisition, applying this paradigm to balance-intensive tasks remains challenging. In this work, we identify three key obstacles: instability from reference motion errors, learning difficulties due to morphological mismatch, and the sim-to-real gap caused by sensor noise and unmodeled dynamics. To address these challenges, we propose HuB (Humanoid Balance), a unified framework that integrates reference motion refinement, balance-aware policy learning, and sim-to-real robustness training, with each component targeting a specific challenge. We validate our approach on the Unitree G1 humanoid robot across challenging quasi-static balance tasks, including extreme single-legged poses such as Swallow Balance and Bruce Lee's Kick. Our policy remains stable even under strong physical disturbances-such as a forceful soccer strike-while baseline methods consistently fail to complete these tasks. Project website: https://hub-robot.github.io
comment: Project website: https://hub-robot.github.io
Learning Quasi-LPV Models and Robust Control Invariant Sets with Reduced Conservativeness
We present an approach to identify a quasi Linear Parameter Varying (qLPV) model of a plant, with the qLPV model guaranteed to admit a robust control invariant (RCI) set. It builds upon the concurrent synthesis framework presented in [1], in which the requirement of existence of an RCI set is modeled as a control-oriented regularization. Here, we reduce the conservativeness of the approach by bounding the qLPV system with an uncertain LTI system, which we derive using bound propagation approaches. The resulting regularization function is the optimal value of a nonlinear robust optimization problem that we solve via a differentiable algorithm. We numerically demonstrate the benefits of the proposed approach over two benchmark approaches.
comment: 12 pages, 3 figures
Continuous-Time Control Synthesis for Multiple Quadrotors under Signal Temporal Logic Specifications
Ensuring continuous-time control of multiple quadrotors in constrained environments under signal temporal logic (STL) specifications is challenging due to nonlinear dynamics, safety constraints, and disturbances. This letter proposes a two-stage framework to address this challenge. First, exponentially decaying tracking error bounds are derived with multidimensional geometric control gains obtained via differential evolution. These bounds are less conservative, while the resulting tracking errors exhibit smaller oscillations and improved transient performance. Second, leveraging the time-varying bounds, a mixed-integer convex programming (MICP) formulation generates piecewise B\'ezier reference trajectories that satisfy STL and velocity limits, while ensuring inter-agent safety through convex-hull properties. Simulation results demonstrate that the proposed approach enables formally verifiable multi-agent coordination in constrained environments, with provable tracking guarantees under bounded disturbances.
A Novel Online Pseudospectral Method for Approximation of Nonlinear Systems Dynamics
This paper presents a novel online system identification approach utilizing the Chebyshev pseudospectral (PS) method to approximate the dynamics of a continuous-time nonlinear system. Unlike conventional periodic sampling, the proposed identification scheme employs aperiodic state sampling, leveraging the Chebyshev nodes to achieve the desired approximation accuracy. Unlike traditional off-line PS approaches, the scheme utilizes a moving time-window strategy to compute the sampling instants (Chebyshev nodes) forward in time for state measurements. Within each time window, the number of sampling instants is adaptively determined to meet the specified approximation accuracy. The Chebyshev basis is also shifted for each time window to accommodate arbitrary approximation intervals. The least-squares approach is employed to estimate the coefficients of the shifted Chebyshev basis functions using the measured state and its derivatives at the end of each window, resulting in a piecewise approximation of the drift dynamics of the system. In the second step, the identified drift dynamics are utilized to design an adaptive state estimator to reconstruct the continuous system states from the aperiodic state measurements. To address the smoothness of the piecewise approximated system dynamics, the Chebyshev coefficients are recomputed at the window transition instants, between two consecutive time windows, by enforcing continuity of the approximated function and its derivatives. In addition, analytical results are provided to determine the number of sampling instants within each time window, which guarantees the desired approximation accuracy. The boundedness of function approximation and state estimation errors is also proved analytically. Finally, numerical simulation results are included to validate the proposed scheme.
comment: 12 pages, 4 figures, Substantially extends our earlier paper accepted for ACC 2025 (Denver, 8 Jul 2025). Submitted to IEEE Transactions on Automatic Control; under review
Economic data-enabled predictive control using machine learning
In this paper, we propose a convex data-based economic predictive control method within the framework of data-enabled predictive control (DeePC). Specifically, we use a neural network to transform the system output into a new state space, where the nonlinear economic cost function of the underlying nonlinear system is approximated using a quadratic function expressed by the transformed output in the new state space. Both the neural network parameters and the coefficients of the quadratic function are learned from open-loop data of the system. Additionally, we reconstruct constrained output variables from the transformed output through learning an output reconstruction matrix; this way, the proposed economic DeePC can handle output constraints explicitly. The performance of the proposed method is evaluated via a case study in a simulated chemical process.
comment: 6 pages, 2 figures
Leveraging Reinforcement Learning and Koopman Theory for Enhanced Model Predictive Control Performance
This study presents an innovative approach to Model Predictive Control (MPC) by leveraging the powerful combination of Koopman theory and Deep Reinforcement Learning (DRL). By transforming nonlinear dynamical systems into a higher-dimensional linear regime, the Koopman operator facilitates the linear treatment of nonlinear behaviors, paving the way for more efficient control strategies. Our methodology harnesses the predictive prowess of Koopman-based models alongside the optimization capabilities of DRL, particularly using the Proximal Policy Optimization (PPO) algorithm, to enhance the controller's performance. The resulting end-to-end learning framework refines the predictive control policies to cater to specific operational tasks, optimizing both performance and economic efficiency. We validate our approach through rigorous NMPC and eNMPC case studies, demonstrating that the Koopman-RL controller outperforms traditional controllers by achieving higher stability, superior constraint satisfaction, and significant cost savings. The findings indicate that our model can be a robust tool for complex control tasks, offering valuable insights into future applications of RL in MPC.
comment: 21 pages, 8 figures, 2 author photos. Affiliations: Electrical Engineering Department, Penn State University. Keywords: Economic Nonlinear Model Predictive Control, (e)NMPC, Reinforcement Learning, Surrogate Models, System Identification
Safety and optimality in learning-based control at low computational cost
Applying machine learning methods to physical systems that are supposed to act in the real world requires providing safety guarantees. However, methods that include such guarantees often come at a high computational cost, making them inapplicable to large datasets and embedded devices with low computational power. In this paper, we propose CoLSafe, a computationally lightweight safe learning algorithm whose computational complexity grows sublinearly with the number of data points. We derive both safety and optimality guarantees and showcase the effectiveness of our algorithm on a seven-degrees-of-freedom robot arm.
comment: Accepted final version to appear in the IEEE Transactions on Automatic Control
VoI-Driven Joint Optimization of Control and Communication in Vehicular Digital Twin Network
The vision of sixth-generation (6G) wireless networks paves the way for the seamless integration of digital twins into vehicular networks, giving rise to a Vehicular Digital Twin Network (VDTN). The large amount of computing resources as well as the massive amount of spatial-temporal data in Digital Twin (DT) domain can be utilized to enhance the communication and control performance of Internet of Vehicle (IoV) systems. In this article, we first propose the architecture of VDTN, emphasizing key modules that center on functions related to the joint optimization of control and communication. We then delve into the intricacies of the multitimescale decision process inherent in joint optimization in VDTN, specifically investigating the dynamic interplay between control and communication. To facilitate the joint optimization, we define two Value of Information (VoI) concepts rooted in control performance. Subsequently, utilizing VoI as a bridge between control and communication, we introduce a novel joint optimization framework, which involves iterative processing of two Deep Reinforcement Learning (DRL) modules corresponding to control and communication to derive the optimal policy. Finally, we conduct simulations of the proposed framework applied to a platoon scenario to demonstrate its effectiveness in ensu
Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach
In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms to solve this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, under additional assumptions, we establish linear convergence rates and derive significantly lower communication and computational complexity bounds than those of existing algorithms. Several numerical experiments validate our theoretical analysis and demonstrate the practical superiority of the proposed algorithms.
Distributed Coordination of Grid-Forming and Grid-Following Inverters for Optimal Frequency Control in Power Systems
The large-scale integration of inverter-interfaced renewable energy sources presents significant challenges to maintaining power balance and nominal frequency in modern power systems. This paper studies grid-level coordinated control of grid-forming (GFM) and grid-following (GFL) inverter-based resources (IBRs) for scalable and optimal frequency control. We propose a fully distributed optimal frequency control algorithm based on the projected primal-dual gradient method and by leveraging the structure of the underlying physical system dynamics. The proposed algorithm i) restores the nominal system frequency while minimizing total control cost and enforcing IBR power capacity limits and line thermal constraints, and ii) operates in a distributed manner that only needs local measurements and neighbor-to-neighbor communication. In particular, when the line thermal constraints are disregarded, the proposed algorithm admits a fully local implementation that requires no communication, while still ensuring optimality and satisfying IBR power capacity limits. We establish the global asymptotic convergence of the algorithm using Lyapunov stability analysis. The effectiveness and optimality of the proposed algorithms are validated through high-fidelity, 100% inverter-based electromagnetic transient (EMT) simulations on the IEEE 39-bus system.
Energetic Resilience of Linear Driftless Systems
When a malfunction causes a control system to lose authority over a subset of its actuators, achieving a task may require spending additional energy in order to compensate for the effect of uncontrolled inputs. To understand this increase in energy, we introduce an energetic resilience metric that quantifies the maximal additional energy required to achieve finite-time regulation in linear driftless systems that suffer this malfunction. We first derive optimal control signals and minimum energies to achieve this task in both the nominal and malfunctioning systems. We then obtain a bound on the worst-case energy used by the malfunctioning system, and its exact expression in the special case of loss of authority over one actuator. Further considering this special case, we derive a bound on the metric for energetic resilience. A simulation example on a model of an underwater robot demonstrates that this bound is useful in quantifying the increased energy used by a system suffering such a malfunction.
comment: 6 pages, 1 figure
Sparse identification of nonlinear dynamics with high accuracy and reliability under noisy conditions for applications to industrial systems
This paper proposes a sparse identification of nonlinear dynamics (SINDy) with control and exogenous inputs for highly accurate and reliable prediction. The method is applied to the diesel engine airpath systems, which are known as a nonlinear complicated industrial system. Although SINDy is recognized as a remarkable approach for identifying nonlinear systems, several challenges remain. Its application to industrial systems remains limited, and multi-step predictions are not guaranteed due to overfitting and noisy data. This phenomenon is often caused by the increase in basis functions resulting from the extension of coordinates, such as time-delay embedding. To address these problems, we propose an emphasized SINDy by incorporating ensemble learning, elite gathering, and classification techniques while keeping convex calculation. The proposed method employs library bagging and extracts elites with an R-squared greater than 90%. Then, clustering is performed on the surviving elites because physically motivated basis functions are not always available, and the elites obtained do not always show the same trends. After the classification, discrete model candidates are obtained by taking the mean of each classified elite. Finally, the best model is selected. The simulation results show that the proposed method realizes multi-step prediction for the airpath system, which is known to be a complicated industrial system under noisy conditions.
Coupled Rendezvous and Docking Maneuver control of satellite using Reinforcement learning-based Adaptive Fixed-Time Sliding Mode Controller
Satellite dynamics in unknown environments are inherently uncertain due to factors such as varying gravitational fields, atmospheric drag, and unpredictable interactions with space debris or other celestial bodies. Traditional sliding mode controllers with fixed parameters often struggle to maintain optimal performance under these fluctuating conditions. Therefore, an adaptive controller is essential to address these challenges by continuously tuning its gains in real-time. In this paper, we have tuned the slopes of the Fixed-time Sliding surface adaptively using reinforcement learning for coupled rendezvous and docking maneuver of chaser satellite with the target satellite in an unknown space environment. The neural network model is used to determine the optimal gains of reaching law of the fixed-time sliding surface. We have assumed that we don't have an accurate model of the system so we have added noise in the tangent space instead of directly on the manifold to preserve the geometric structure of the system while ensuring mathematically consistent uncertainty propagation. The reinforcement learning is used as an approximator to represent the value function of the agent to estimate the dynamical model of the system using the Actor-Critic method. The proposed control algorithm integrates a neural network and a sliding mode controller in a cascade loop architecture, where the tracking error dynamically tunes the sliding surface gains. Global fixed-time stability of the closed-loop feedback system is proved within the Lyapunov framework. This comprehensive approach of fixed-time sliding mode controller using a Reinforcement Learning based ensures the completion of the mission efficiently while addressing the critical challenges posed by the uncertain environment. The simulation results presented support the claims made.
Analysis of Power Losses and the Efficacy of Power Minimization Strategies in Multichannel Electrical Stimulation Systems
Neuroprosthetic devices require multichannel stimulator systems with an increasing number of channels. However, there are inherent power losses in typical multichannel stimulation circuits caused by a mismatch between the power supply voltage and the voltage required at each electrode to successfully stimulate tissue. This imposes a bottleneck towards high-channel-count devices, which is particularly severe in wirelessly-powered devices. Hence, advances in the power efficiency of stimulation systems are critical. To support these advances, this paper presents a methodology to identify and quantify power losses associated with different power supply scaling strategies in multichannel stimulation systems. The proposed methodology utilizes distributions of stimulation amplitudes and electrode impedances to calculate power losses in multichannel systems. Experimental data from previously published studies spanning various stimulation applications were analyzed to evaluate the performance of fixed, global, and stepped supply scaling methods, focusing on their impact on power dissipation and efficiency. Variability in output conditions results in low power efficiency in multichannel stimulation systems across all applications. Stepped voltage scaling demonstrated substantial efficiency improvements, achieving an increase of 67 % to 146 %, particularly in high-channel-count applications with significant variability in tissue impedance. Global scaling, by contrast, was more advantageous for systems with fewer channels. The findings highlight the importance of tailoring power management strategies to specific applications to optimize efficiency while minimizing system complexity. The proposed methodology offers a framework for evaluating efficiency-complexity trade-offs, advancing the design of scalable neurostimulation systems.
comment: 24 pages, 11 figures
Modeling False Data Injection Attacks in Integrated Electricity-Gas Systems
This work studies the modeling of false data injection attacks (FDIAs) in integrated electricity-gas systems (IEGSs). First, we introduce a static state estimation model and bad data detection method for IEGSs. Then, we develop FDIAs on IEGSs with complete network topology and parameter information. Next, we develop FDIAs on IEGSs when intruders have only local network topology and parameter information of an IEGS. Lastly, we explore FDIAs on IEGSs when intruders have only local network topology information of an IEGS.
comment: 13 pages
Learning to Drive Anywhere with Model-Based Reannotation
Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.
comment: 19 pages, 11 figures, 8 tables
Iterated Invariant Extended Kalman Filter (IterIEKF)
We study the mathematical properties of the Invariant Extended Kalman Filter (IEKF) when iterating on the measurement update step, following the principles of the well-known Iterated Extended Kalman Filter. This iterative variant of the IEKF (IterIEKF) systematically improves its accuracy through Gauss-Newton-based relinearization, and exhibits additional theoretical properties, particularly in the low-noise regime, that resemble those of the linear Kalman filter. We apply the proposed approach to the problem of estimating the extended pose of a crane payload using an inertial measurement unit. Our results suggest that the IterIEKF significantly outperforms the IEKF when measurements are highly accurate.
Exploring Generative AI Techniques in Government: A Case Study
The swift progress of Generative Artificial intelligence (GenAI), notably Large Language Models (LLMs), is reshaping the digital landscape. Recognizing this transformative potential, the National Research Council of Canada (NRC) launched a pilot initiative to explore the integration of GenAI techniques into its daily operation for performance excellence, where 22 projects were launched in May 2024. Within these projects, this paper presents the development of the intelligent agent Pubbie as a case study, targeting the automation of performance measurement, data management and insight reporting at the NRC. Cutting-edge techniques are explored, including LLM orchestration and semantic embedding via RoBERTa, while strategic fine-tuning and few-shot learning approaches are incorporated to infuse domain knowledge at an affordable cost. The user-friendly interface of Pubbie allows general government users to input queries in natural language and easily upload or download files with a simple button click, greatly reducing manual efforts and accessibility barriers.
comment: In submission to IEEE Intelligent Systems
Uncertainty-aware data-driven predictive control in a stochastic setting
Data-Driven Predictive Control (DDPC) has been recently proposed as an effective alternative to traditional Model Predictive Control (MPC), in that the same constrained optimization problem can be addressed without the need to explicitly identify a full model of the plant. However, DDPC is built upon input/output trajectories. Therefore, the finite sample effect of stochastic data, due to, e.g., measurement noise, may have a detrimental impact on closed-loop performance. Exploiting a formal statistical analysis of the prediction error, in this paper we propose the first systematic approach to deal with uncertainty due to finite sample effects. To this end, we introduce two regularization strategies for which, differently from existing regularization-based DDPC techniques, we propose a tuning rationale allowing us to select the regularization hyper-parameters before closing the loop and without additional experiments. Simulation results confirm the potential of the proposed strategy when closing the loop.
comment: 6 pages, 1 figure, this work has been submitted and accepted for publication at the IFAC World Congress 2023, Yokohama, Japan
Consensus Building in Human-robot Co-learning via Bias Controlled Nonlinear Opinion Dynamics and Non-verbal Communication through Robotic Eyes
Consensus between humans and robots is crucial as robotic agents become more prevalent and deeply integrated into our daily lives. This integration presents both unprecedented opportunities and notable challenges for effective collaboration. However, the active guidance of human actions and their integration in co-learning processes, where humans and robots mutually learn from each other, remains under-explored. This article demonstrates how consensus between human and robot opinions can be established by modeling decision-making processes as non-linear opinion dynamics. We utilize dynamic bias as a control parameter to steer the robot's opinion toward consensus and employ visual cues via a robotic eye gaze to guide human decisions. These non-verbal cues communicate the robot's future intentions, gradually guiding human decisions to align with them. To design robot behavior for consensus, we integrate a human opinion observation algorithm with the robot's opinion formation, controlling its actions based on that formed opinion. Experiments with $51$ participants ($N=51$) in a two-choice decision-making task show that effective consensus and trust can be established in a human--robot co-learning setting by guiding human decisions through nonverbal robotic cues and using bias-controlled opinion dynamics to shape robot behavior. Finally, we provide detailed information on the perceived cognitive load and the behavior of robotic eyes based on user feedback and post-experiment interviews.
comment: 48 pages, 8 figures
A Prudent Framework for Understanding Risk-Awareness in Demand Response
We show that risk-aware behaviors in demand response originate from superquadratic state-dependent cost functions and price uncertainty with skewed distributions. We obtain such results through developing a novel theoretical demand response framework that combines non-anticipatory multi-stage decision-making with superquadratic cost functions. We introduce the concept of prudent demand, defined by a positive third-order derivative of the cost function, which is the first principle for risk-averse behavior despite a risk-neutral objective. Our analysis establishes that future price uncertainty affects immediate consumption decisions, and the extent of this response scales proportionally with the skewness of the price distribution. We visualize our theoretical findings through numerical simulations and illustrate their practical implications using a real-world case study.
The Pump Scheduling Problem: A Real-World Scenario for Reinforcement Learning
Deep Reinforcement Learning (DRL) has demonstrated impressive results in domains such as games and robotics, where task formulations are well-defined. However, few DRL benchmarks are grounded in complex, real-world environments, where safety constraints, partial observability, and the need for hand-engineered task representations pose significant challenges. To help bridge this gap, we introduce a testbed based on the pump scheduling problem in a real-world water distribution facility. The task involves controlling pumps to ensure a reliable water supply while minimizing energy consumption and respecting the constraints of the system. Our testbed includes a realistic simulator, three years of high-resolution (1-minute) operational data from human-led control, and a baseline RL task formulation. This testbed supports a wide range of research directions, including offline RL, safe exploration, inverse RL, and multi-objective optimization.
Robotics
Terrain-aware Low Altitude Path Planning
In this paper, we study the problem of generating low altitude path plans for nap-of-the-earth (NOE) flight in real time with only RGB images from onboard cameras and the vehicle pose. We propose a novel training method that combines behavior cloning and self-supervised learning that enables the learned policy to outperform the policy trained with standard behavior cloning approach on this task. Simulation studies are performed on a custom canyon terrain.
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
Human videos offer a scalable way to train robot manipulation policies, but lack the action labels needed by standard imitation learning algorithms. Existing cross-embodiment approaches try to map human motion to robot actions, but often fail when the embodiments differ significantly. We propose X-Sim, a real-to-sim-to-real framework that uses object motion as a dense and transferable signal for learning robot policies. X-Sim starts by reconstructing a photorealistic simulation from an RGBD human video and tracking object trajectories to define object-centric rewards. These rewards are used to train a reinforcement learning (RL) policy in simulation. The learned policy is then distilled into an image-conditioned diffusion policy using synthetic rollouts rendered with varied viewpoints and lighting. To transfer to the real world, X-Si introduces an online domain adaptation technique that aligns real and simulated observations during deployment. Importantly, X-Sim does not require any robot teleoperation data. We evaluate it across 5 manipulation tasks in 2 environments and show that it: (1) improves task progress by 30% on average over hand-tracking and sim-to-real baselines, (2) matches behavior cloning with 10x less data collection time, and (3) generalizes to new camera viewpoints and test-time changes. Code and videos are available at https://portal-cornell.github.io/X-Sim/.
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
Human drivers naturally possess the ability to perceive driving scenarios, predict potential hazards, and react instinctively due to their spatial and causal intelligence, which allows them to perceive, understand, predict, and interact with the 3D world both spatially and temporally. Autonomous vehicles, however, lack these capabilities, leading to challenges in effectively managing perception-related Safety of the Intended Functionality (SOTIF) risks, particularly in complex and unpredictable driving conditions. To address this gap, we propose an approach that fine-tunes multimodal language models (MLLMs) on a customized dataset specifically designed to capture perception-related SOTIF scenarios. Model benchmarking demonstrates that this tailored dataset enables the models to better understand and respond to these complex driving situations. Additionally, in real-world case studies, the proposed method correctly handles challenging scenarios that even human drivers may find difficult. Real-time performance tests further indicate the potential for the models to operate efficiently in live driving environments. This approach, along with the dataset generation pipeline, shows significant promise for improving the identification, cognition, prediction, and reaction to SOTIF-related risks in autonomous driving systems. The dataset and information are available: https://github.com/s95huang/DriveSOTIF.git
comment: V1 of the manuscript, submitted to IEEE-TVT; V2 revision in progress
VALISENS: A Validated Innovative Multi-Sensor System for Cooperative Automated Driving SC
Perception is a core capability of automated vehicles and has been significantly advanced through modern sensor technologies and artificial intelligence. However, perception systems still face challenges in complex real-world scenarios. To improve robustness against various external factors, multi-sensor fusion techniques are essential, combining the strengths of different sensor modalities. With recent developments in Vehicle-to-Everything (V2X communication, sensor fusion can now extend beyond a single vehicle to a cooperative multi-agent system involving Connected Automated Vehicle (CAV) and intelligent infrastructure. This paper presents VALISENS, an innovative multi-sensor system distributed across multiple agents. It integrates onboard and roadside LiDARs, radars, thermal cameras, and RGB cameras to enhance situational awareness and support cooperative automated driving. The thermal camera adds critical redundancy for perceiving Vulnerable Road User (VRU), while fusion with roadside sensors mitigates visual occlusions and extends the perception range beyond the limits of individual vehicles. We introduce the corresponding perception module built on this sensor system, which includes object detection, tracking, motion forecasting, and high-level data fusion. The proposed system demonstrates the potential of cooperative perception in real-world test environments and lays the groundwork for future Cooperative Intelligent Transport Systems (C-ITS) applications.
comment: 7 pages, 11 figures, submitted to IEEE ITSC
Reinforcement Learning-Based Monocular Vision Approach for Autonomous UAV Landing
This paper introduces an innovative approach for the autonomous landing of Unmanned Aerial Vehicles (UAVs) using only a front-facing monocular camera, therefore obviating the requirement for depth estimation cameras. Drawing on the inherent human estimating process, the proposed method reframes the landing task as an optimization problem. The UAV employs variations in the visual characteristics of a specially designed lenticular circle on the landing pad, where the perceived color and form provide critical information for estimating both altitude and depth. Reinforcement learning algorithms are utilized to approximate the functions governing these estimations, enabling the UAV to ascertain ideal landing settings via training. This method's efficacy is assessed by simulations and experiments, showcasing its potential for robust and accurate autonomous landing without dependence on complex sensor setups. This research contributes to the advancement of cost-effective and efficient UAV landing solutions, paving the way for wider applicability across various fields.
Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation ICRA
In autonomous driving, thermal image semantic segmentation has emerged as a critical research area, owing to its ability to provide robust scene understanding under adverse visual conditions. In particular, unsupervised domain adaptation (UDA) for thermal image segmentation can be an efficient solution to address the lack of labeled thermal datasets. Nevertheless, since these methods do not effectively utilize the complementary information between RGB and thermal images, they significantly decrease performance during domain adaptation. In this paper, we present a comprehensive study on cross-spectral UDA for thermal image semantic segmentation. We first propose a novel masked mutual learning strategy that promotes complementary information exchange by selectively transferring results between each spectral model while masking out uncertain regions. Additionally, we introduce a novel prototypical self-supervised loss designed to enhance the performance of the thermal segmentation model in nighttime scenarios. This approach addresses the limitations of RGB pre-trained networks, which cannot effectively transfer knowledge under low illumination due to the inherent constraints of RGB sensors. In experiments, our method achieves higher performance over previous UDA methods and comparable performance to state-of-the-art supervised methods.
comment: 7 pages, 4 figures, International Conference on Robotics and Automation(ICRA) 2025
YOPOv2-Tracker: An End-to-End Agile Tracking and Navigation Framework from Perception to Action
Traditional target tracking pipelines including detection, mapping, navigation, and control are comprehensive but introduce high latency, limitting the agility of quadrotors. On the contrary, we follow the design principle of "less is more", striving to simplify the process while maintaining effectiveness. In this work, we propose an end-to-end agile tracking and navigation framework for quadrotors that directly maps the sensory observations to control commands. Importantly, leveraging the multimodal nature of navigation and detection tasks, our network maintains interpretability by explicitly integrating the independent modules of the traditional pipeline, rather than a crude action regression. In detail, we adopt a set of motion primitives as anchors to cover the searching space regarding the feasible region and potential target. Then we reformulate the trajectory optimization as regression of primitive offsets and associated costs considering the safety, smoothness, and other metrics. For tracking task, the trajectories are expected to approach the target and additional objectness scores are predicted. Subsequently, the predictions, after compensation for the estimated lumped disturbance, are transformed into thrust and attitude as control commands for swift response. During training, we seamlessly integrate traditional motion planning with deep learning by directly back-propagating the gradients of trajectory costs to the network, eliminating the need for expert demonstration in imitation learning and providing more direct guidance than reinforcement learning. Finally, we deploy the algorithm on a compact quadrotor and conduct real-world validations in both forest and building environments to demonstrate the efficiency of the proposed method.
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
The first WARA Robotics Mobile Manipulation Challenge, held in December 2024 at ABB Corporate Research in V\"aster{\aa}s, Sweden, addressed the automation of task-intensive and repetitive manual labor in laboratory environments - specifically the transport and cleaning of glassware. Designed in collaboration with AstraZeneca, the challenge invited academic teams to develop autonomous robotic systems capable of navigating human-populated lab spaces and performing complex manipulation tasks, such as loading items into industrial dishwashers. This paper presents an overview of the challenge setup, its industrial motivation, and the four distinct approaches proposed by the participating teams. We summarize lessons learned from this edition and propose improvements in design to enable a more effective second iteration to take place in 2025. The initiative bridges an important gap in effective academia-industry collaboration within the domain of autonomous mobile manipulation systems by promoting the development and deployment of applied robotic solutions in real-world laboratory contexts.
Realistic Counterfactual Explanations for Machine Learning-Controlled Mobile Robots using 2D LiDAR
This paper presents a novel method for generating realistic counterfactual explanations (CFEs) in machine learning (ML)-based control for mobile robots using 2D LiDAR. ML models, especially artificial neural networks (ANNs), can provide advanced decision-making and control capabilities by learning from data. However, they often function as black boxes, making it challenging to interpret them. This is especially a problem in safety-critical control applications. To generate realistic CFEs, we parameterize the LiDAR space with simple shapes such as circles and rectangles, whose parameters are chosen by a genetic algorithm, and the configurations are transformed into LiDAR data by raycasting. Our model-agnostic approach generates CFEs in the form of synthetic LiDAR data that resembles a base LiDAR state but is modified to produce a pre-defined ML model control output based on a query from the user. We demonstrate our method on a mobile robot, the TurtleBot3, controlled using deep reinforcement learning (DRL) in real-world and simulated scenarios. Our method generates logical and realistic CFEs, which helps to interpret the DRL agent's decision making. This paper contributes towards advancing explainable AI in mobile robotics, and our method could be a tool for understanding, debugging, and improving ML-based autonomous control.
comment: Accepted for publication at the 2025 European Control Conference (ECC)
FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots
Reinforcement learning (RL) has made significant strides in legged robot control, enabling locomotion across diverse terrains and complex loco-manipulation capabilities. However, the commonly used position or velocity tracking-based objectives are agnostic to forces experienced by the robot, leading to stiff and potentially dangerous behaviors and poor control during forceful interactions. To address this limitation, we present \emph{Force-Adaptive Control via Impedance Reference Tracking} (FACET). Inspired by impedance control, we use RL to train a control policy to imitate a virtual mass-spring-damper system, allowing fine-grained control under external forces by manipulating the virtual spring. In simulation, we demonstrate that our quadruped robot achieves improved robustness to large impulses (up to 200 Ns) and exhibits controllable compliance, achieving an 80% reduction in collision impulse. The policy is deployed to a physical robot to showcase both compliance and the ability to engage with large forces by kinesthetic control and pulling payloads up to 2/3 of its weight. Further extension to a legged loco-manipulator and a humanoid shows the applicability of our method to more complex settings to enable whole-body compliance control. Project Website: https://egalahad.github.io/facet/
Towards Human-Centric Autonomous Driving: A Fast-Slow Architecture Integrating Large Language Model Guidance with Reinforcement Learning
Autonomous driving has made significant strides through data-driven techniques, achieving robust performance in standardized tasks. However, existing methods frequently overlook user-specific preferences, offering limited scope for interaction and adaptation with users. To address these challenges, we propose a "fast-slow" decision-making framework that integrates a Large Language Model (LLM) for high-level instruction parsing with a Reinforcement Learning (RL) agent for low-level real-time decision. In this dual system, the LLM operates as the "slow" module, translating user directives into structured guidance, while the RL agent functions as the "fast" module, making time-critical maneuvers under stringent latency constraints. By decoupling high-level decision making from rapid control, our framework enables personalized user-centric operation while maintaining robust safety margins. Experimental evaluations across various driving scenarios demonstrate the effectiveness of our method. Compared to baseline algorithms, the proposed architecture not only reduces collision rates but also aligns driving behaviors more closely with user preferences, thereby achieving a human-centric mode. By integrating user guidance at the decision level and refining it with real-time control, our framework bridges the gap between individual passenger needs and the rigor required for safe, reliable driving in complex traffic environments.
Efficient Robotic Policy Learning via Latent Space Backward Planning ICML 2025
Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Latent Space Backward Planning scheme (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io
comment: Accepted by ICML 2025
Beyond Patterns: Harnessing Causal Logic for Autonomous Driving Trajectory Prediction
Accurate trajectory prediction has long been a major challenge for autonomous driving (AD). Traditional data-driven models predominantly rely on statistical correlations, often overlooking the causal relationships that govern traffic behavior. In this paper, we introduce a novel trajectory prediction framework that leverages causal inference to enhance predictive robustness, generalization, and accuracy. By decomposing the environment into spatial and temporal components, our approach identifies and mitigates spurious correlations, uncovering genuine causal relationships. We also employ a progressive fusion strategy to integrate multimodal information, simulating human-like reasoning processes and enabling real-time inference. Evaluations on five real-world datasets--ApolloScape, nuScenes, NGSIM, HighD, and MoCAD--demonstrate our model's superiority over existing state-of-the-art (SOTA) methods, with improvements in key metrics such as RMSE and FDE. Our findings highlight the potential of causal reasoning to transform trajectory prediction, paving the way for robust AD systems.
Secure Safety Filter: Towards Safe Flight Control under Sensor Attacks IROS 2025
Modern autopilot systems are prone to sensor attacks that can jeopardize flight safety. To mitigate this risk, we proposed a modular solution: the secure safety filter, which extends the well-established control barrier function (CBF)-based safety filter to account for, and mitigate, sensor attacks. This module consists of a secure state reconstructor (which generates plausible states) and a safety filter (which computes the safe control input that is closest to the nominal one). Differing from existing work focusing on linear, noise-free systems, the proposed secure safety filter handles bounded measurement noise and, by leveraging reduced-order model techniques, is applicable to the nonlinear dynamics of drones. Software-in-the-loop simulations and drone hardware experiments demonstrate the effectiveness of the secure safety filter in rendering the system safe in the presence of sensor attacks.
comment: Submitted to IROS 2025
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
Open-vocabulary, task-oriented grasping of specific functional parts, particularly with dual arms, remains a key challenge, as current Vision-Language Models (VLMs), while enhancing task understanding, often struggle with precise grasp generation within defined constraints and effective dual-arm coordination. We innovatively propose UniDiffGrasp, a unified framework integrating VLM reasoning with guided part diffusion to address these limitations. UniDiffGrasp leverages a VLM to interpret user input and identify semantic targets (object, part(s), mode), which are then grounded via open-vocabulary segmentation. Critically, the identified parts directly provide geometric constraints for a Constrained Grasp Diffusion Field (CGDF) using its Part-Guided Diffusion, enabling efficient, high-quality 6-DoF grasps without retraining. For dual-arm tasks, UniDiffGrasp defines distinct target regions, applies part-guided diffusion per arm, and selects stable cooperative grasps. Through extensive real-world deployment, UniDiffGrasp achieves grasp success rates of 0.876 in single-arm and 0.767 in dual-arm scenarios, significantly surpassing existing state-of-the-art methods, demonstrating its capability to enable precise and coordinated open-vocabulary grasping in complex real-world scenarios.
comment: 8 pages, 5 figures
Dynamic Safety in Complex Environments: Synthesizing Safety Filters with Poisson's Equation
Synthesizing safe sets for robotic systems operating in complex and dynamically changing environments is a challenging problem. Solving this problem can enable the construction of safety filters that guarantee safe control actions -- most notably by employing Control Barrier Functions (CBFs). This paper presents an algorithm for generating safe sets from perception data by leveraging elliptic partial differential equations, specifically Poisson's equation. Given a local occupancy map, we solve Poisson's equation subject to Dirichlet boundary conditions, with a novel forcing function. Specifically, we design a smooth guidance vector field, which encodes gradient information required for safety. The result is a variational problem for which the unique minimizer -- a safety function -- characterizes the safe set. After establishing our theoretical result, we illustrate how safety functions can be used in CBF-based safety filtering. The real-time utility of our synthesis method is highlighted through hardware demonstrations on quadruped and humanoid robots navigating dynamically changing obstacle-filled environments.
cpRRTC: GPU-Parallel RRT-Connect for Constrained Motion Planning
Motion planning is a fundamental problem in robotics that involves generating feasible trajectories for a robot to follow. Recent advances in parallel computing, particularly through CPU and GPU architectures, have significantly reduced planning times to the order of milliseconds. However, constrained motion planning especially using sampling based methods on GPUs remains underexplored. Prior work such as pRRTC leverages a tracking compiler with a CUDA backend to accelerate forward kinematics and collision checking. While effective in simple settings, their approach struggles with increased complexity in robot models or environments. In this paper, we propose a novel GPU based framework utilizing NVRTC for runtime compilation, enabling efficient handling of high complexity scenarios and supporting constrained motion planning. Experimental results demonstrate that our method achieves superior performance compared to existing approaches.
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
Efficient path planning in robotics, particularly within large-scale, dynamic environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability in dynamic scenarios hinder real-time deployment on edge devices. We present SmallPlan -- a novel framework leveraging LLMs as teacher models to train lightweight Small Language Models (SLMs) for high-level path planning tasks. In SmallPlan, the SLMs provide optimal action sequences to navigate across scene graphs that compactly represent full-scaled 3D scenes. The SLMs are trained in a simulation-powered, interleaved manner with LLM-guided supervised fine-tuning (SFT) and reinforcement learning (RL). This strategy not only enables SLMs to successfully complete navigation tasks but also makes them aware of important factors like travel distance and number of trials. Through experiments, we demonstrate that the fine-tuned SLMs perform competitively with larger models like GPT-4o on sequential path planning, without suffering from hallucination and overfitting. SmallPlan is resource-efficient, making it well-suited for edge-device deployment and advancing practical autonomous robotics. Our source code is available here: https://github.com/quangpham2006/SmallPlan
comment: Paper is under review
Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms
Recent imitation learning policies, often framed as time series prediction tasks, directly map robotic observations-such as high-dimensional visual data and proprioception-into the action space. While time series prediction primarily relies on spatial domain modeling, the underutilization of frequency domain analysis in robotic manipulation trajectory prediction may lead to neglecting the inherent temporal information embedded within action sequences. To address this, we reframe imitation learning policies through the lens of the frequency domain and introduce the Wavelet Policy. This novel approach employs wavelet transforms (WT) for feature preprocessing and extracts multi-scale features from the frequency domain using the SE2MD (Single Encoder to Multiple Decoder) architecture. Furthermore, to enhance feature mapping in the frequency domain and increase model capacity, we introduce a Learnable Frequency-Domain Filter (LFDF) after each frequency decoder, improving adaptability under different visual conditions. Our results show that the Wavelet Policy outperforms state-of-the-art (SOTA) end-to-end methods by over 10% on four challenging robotic arm tasks, while maintaining a comparable parameter count. In long-range settings, its performance declines more slowly as task volume increases. The source code is available at https://github.com/lurenjia384/Wavelet_Policy.
iTeach: Interactive Teaching for Robot Perception using Mixed Reality
We introduce iTeach, a human-in-the-loop Mixed Reality (MR) system that enhances robot perception through interactive teaching. Our system enables users to visualize robot perception outputs such as object detection and segmentation results using a MR device. Therefore, users can inspect failures of perception models using the system on real robots. Moreover, iTeach facilitates real-time, informed data collection and annotation, where users can use hand gesture, eye gaze and voice commands to annotate images collected from robots. The annotated images can be used to fine-tune perception models to improve their accuracy and adaptability. The system continually improves perception models by collecting annotations of failed examples from users. When applied to object detection and unseen object instance segmentation (UOIS) tasks, iTeach demonstrates encouraging results in improving pre-trained vision models for these two tasks. These results highlight the potential of MR to make robotic perception systems more capable and adaptive in real-world environments. Project page at https://irvlutd.github.io/iTeach.
A Clinical Tuning Framework for Continuous Kinematic and Impedance Control of a Powered Knee-Ankle Prosthesis
Configuring a prosthetic leg is an integral part of the fitting process, but the personalization of a multi-modal powered knee-ankle prosthesis is often too complex to realize in a clinical environment. This paper develops both the technical means to individualize a hybrid kinematic-impedance controller for variable-incline walking and sit-stand transitions, and an intuitive Clinical Tuning Interface (CTI) that allows prosthetists to directly modify the controller behavior. Utilizing an established method for predicting kinematic gait individuality alongside a new parallel approach for kinetic individuality, we personalize continuous-phase/task models of joint impedance (during stance) and kinematics (during swing) using tuned characteristics exclusively from level-ground walking. To take advantage of this method, we developed a CTI that translates common clinical tuning parameters into model adjustments for the walking and sit-stand controllers. We then conducted a case study where a prosthetist iteratively tuned the powered prosthesis to an above-knee amputee participant in a simulated clinical session involving sit-stand transitions and level walking, from which incline/decline walking features were automatically calibrated. The prosthetist fully tuned the multi-activity prosthesis controller in under 20 min. Each iteration of tuning (i.e., observation, parameter adjustment, and model reprocessing) took 2 min on average for walking and 1 min on average for sit-stand. The tuned behavior changes were appropriately manifested in the commanded prosthesis torques, both at the manually tuned tasks and automatically tuned tasks (inclines). This paper introduces a clinical tuning interface that simplifies the tuning process for multimodal robotic prosthetic legs, reducing the time required from several hours to just 20 min thus improving clinical feasibility.
comment: Published in IEEE JTEHM. IEEE Journal of Translational Engineering in Health and Medicine (2025)
Bilevel Learning for Bilevel Planning
A robot that learns from demonstrations should not just imitate what it sees -- it should understand the high-level concepts that are being demonstrated and generalize them to new tasks. Bilevel planning is a hierarchical model-based approach where predicates (relational state abstractions) can be leveraged to achieve compositional generalization. However, previous bilevel planning approaches depend on predicates that are either hand-engineered or restricted to very simple forms, limiting their scalability to sophisticated, high-dimensional state spaces. To address this limitation, we present IVNTR, the first bilevel planning approach capable of learning neural predicates directly from demonstrations. Our key innovation is a neuro-symbolic bilevel learning framework that mirrors the structure of bilevel planning. In IVNTR, symbolic learning of the predicate "effects" and neural learning of the predicate "functions" alternate, with each providing guidance for the other. We evaluate IVNTR in six diverse robot planning domains, demonstrating its effectiveness in abstracting various continuous and high-dimensional states. While most existing approaches struggle to generalize (with <35% success rate), our IVNTR achieves an average of 77% success rate on unseen tasks. Additionally, we showcase IVNTR on a mobile manipulator, where it learns to perform real-world mobile manipulation tasks and generalizes to unseen test scenarios that feature new objects, new states, and longer task horizons. Our findings underscore the promise of learning and planning with abstractions as a path towards high-level generalization.
comment: In Proceedings of Robotics, Science, and Systems (RSS 2025). See our website for details: https://jaraxxus-me.github.io/IVNTR/
Near-optimal Sensor Placement for Detecting Stochastic Target Trajectories in Barrier Coverage Systems
This paper addresses the deployment of sensors for a 2-D barrier coverage system. The challenge is to compute near-optimal sensor placements for detecting targets whose trajectories follow a log-Gaussian Cox line process. We explore sensor deployment in a transformed space, where linear target trajectories are represented as points. While this space simplifies handling the line process, the spatial functions representing sensor performance (i.e. probability of detection) become less intuitive. To illustrate our approach, we focus on positioning sensors of the barrier coverage system on the seafloor to detect passing ships. Through numerical experiments using historical ship data, we compute sensor locations that maximize the probability all ship passing over the barrier coverage system are detected.
comment: This work is published in IEEE SysCon 2025
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills
Building autonomous robotic agents capable of achieving human-level performance in real-world embodied tasks is an ultimate goal in humanoid robot research. Recent advances have made significant progress in high-level cognition with Foundation Models (FMs) and low-level skill development for humanoid robots. However, directly combining these components often results in poor robustness and efficiency due to compounding errors in long-horizon tasks and the varied latency of different modules. We introduce Being-0, a hierarchical agent framework that integrates an FM with a modular skill library. The FM handles high-level cognitive tasks such as instruction understanding, task planning, and reasoning, while the skill library provides stable locomotion and dexterous manipulation for low-level control. To bridge the gap between these levels, we propose a novel Connector module, powered by a lightweight vision-language model (VLM). The Connector enhances the FM's embodied capabilities by translating language-based plans into actionable skill commands and dynamically coordinating locomotion and manipulation to improve task success. With all components, except the FM, deployable on low-cost onboard computation devices, Being-0 achieves efficient, real-time performance on a full-sized humanoid robot equipped with dexterous hands and active vision. Extensive experiments in large indoor environments demonstrate Being-0's effectiveness in solving complex, long-horizon tasks that require challenging navigation and manipulation subtasks. For further details and videos, visit https://beingbeyond.github.io/Being-0.
MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models
Controlling hands in high-dimensional action space has been a longstanding challenge, yet humans naturally perform dexterous tasks with ease. In this paper, we draw inspiration from the concept of internal model exhibited in human behavior and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that includes a couple of neural networks (NNs) capturing the dynamical characteristics of hands and a bidirectional planning approach, which demonstrates both training and planning efficiency. To show the versatility of MoDex, we further integrate it with an external model to manipulate in-hand objects and a large language model (LLM) to generate various gestures in both simulation and real world. Extensive experiments on different dexterous hands address the data efficiency in learning a new task and the transferability between different tasks.
comment: 21 pages
Neural Predictor for Flight Control with Payload
Aerial robotics for transporting suspended payloads as the form of freely-floating manipulator are growing great interest in recent years. However, the force/torque caused by payload and residual dynamics will introduce unmodeled perturbations to the aerial robotics, which negatively affects the closed-loop performance. Different from estimation-like methods, this paper proposes Neural Predictor, a learning-based approach to model force/torque induced by payload and residual dynamics as a dynamical system. It yields a hybrid model that combines the first-principles dynamics with the learned dynamics. The hybrid model is then integrated into a MPC framework to improve closed-loop performance. Effectiveness of proposed framework is verified extensively in both numerical simulations and real-world flight experiments. The results indicate that our approach can capture force/torque caused by suspended payload and residual dynamics accurately, respond quickly to the changes of them and improve the closed-loop performance significantly. In particular, Neural Predictor outperforms a state-of-the-art learning-based estimator and has reduced the force and torque estimation errors by up to 66.15% and 33.33% while requiring less samples. The code of proposed Neural Predictor can be found at https://github.com/NPU-RCIR/Neural-Predictor.git.
comment: This paper (longer version) has been accepted in RA-L
Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC
Extreme cornering in racing often leads to large sideslip angles, presenting a significant challenge for vehicle control. Conventional vehicle controllers struggle to manage this scenario, necessitating the use of a drifting controller. However, the large sideslip angle in drift conditions introduces model mismatch, which in turn affects control precision. To address this issue, we propose a model correction drift controller that integrates Model Predictive Control (MPC) with Gaussian Process Regression (GPR). GPR is employed to correct vehicle model mismatches during both drift equilibrium solving and the MPC optimization process. Additionally, the variance from GPR is utilized to actively explore different cornering drifting velocities, aiming to minimize trajectory tracking errors. The proposed algorithm is validated through simulations on the Simulink-Carsim platform and experiments with a 1:10 scale RC vehicle. In the simulation, the average lateral error with GPR is reduced by 52.8% compared to the non-GPR case. Incorporating exploration further decreases this error by 27.1%. The velocity tracking Root Mean Square Error (RMSE) also decreases by 10.6% with exploration. In the RC car experiment, the average lateral error with GPR is 36.7% lower, and exploration further leads to a 29.0% reduction. Moreover, the velocity tracking RMSE decreases by 7.2% with the inclusion of exploration.
One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering
Identifying predictive world models for robots in novel environments from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable programming to identify world models are incapable of jointly optimizing the geometry, appearance, and physical properties of the scene. In this work, we introduce a novel rigid object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based geometry representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models, given the sparse visual and tactile observations of a physical motion sequence. Through a series of world model identification tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready world models from only one robot action sequence. The code and additional videos are available at our project website: https://tianyi20.github.io/rigid-world-model.github.io/
comment: 8 pages, 8 figures. Published at IEEE Robotics Automation Letters
Multiagent Systems
Constant-Memory Strategies in Stochastic Games: Best Responses and Equilibria
(Here is a short version, see our paper for the complete abstract.) In this work, we comprehensively investigate the concept of constant-memory strategies in stochastic games. We first establish some results on best responses and Nash equilibria for behavioral constant-memory strategies, followed by a discussion on the computational hardness of best responding to mixed constant-memory strategies. Those theoretic insights later empower a generative framework for studying generalizability of single-agent RL algorithms.
comment: 19 pages, ongoing work
The Wisdom of Agent Crowds: A Human-AI Interaction Innovation Ignition Framework
With the widespread application of large AI models in various fields, the automation level of multi-agent systems has been continuously improved. However, in high-risk decision-making scenarios such as healthcare and finance, human participation and the alignment of intelligent systems with human intentions remain crucial. This paper focuses on the financial scenario and constructs a multi-agent brainstorming framework based on the BDI theory. A human-computer collaborative multi-agent financial analysis process is built using Streamlit. The system plans tasks according to user intentions, reduces users' cognitive load through real-time updated structured text summaries and the interactive Cothinker module, and reasonably integrates general and reasoning large models to enhance the ability to handle complex problems. By designing a quantitative analysis algorithm for the sentiment tendency of interview content based on LLMs and a method for evaluating the diversity of ideas generated by LLMs in brainstorming based on k-means clustering and information entropy, the system is comprehensively evaluated. The results of human factors testing show that the system performs well in terms of usability and user experience. Although there is still room for improvement, it can effectively support users in completing complex financial tasks. The research shows that the system significantly improves the efficiency of human-computer interaction and the quality of decision-making in financial decision-making scenarios, providing a new direction for the development of related fields.
Systems and Control (CS)
AI-Driven Optimization of Wave-Controlled Reconfigurable Intelligent Surfaces
A promising type of Reconfigurable Intelligent Surface (RIS) employs tunable control of its varactors using biasing transmission lines below the RIS reflecting elements. Biasing standing waves (BSWs) are excited by a time-periodic signal and sampled at each RIS element to create a desired biasing voltage and control the reflection coefficients of the elements. A simple rectifier can be used to sample the voltages and capture the peaks of the BSWs over time. Like other types of RIS, attempting to model and accurately configure a wave-controlled RIS is extremely challenging due to factors such as device non-linearities, frequency dependence, element coupling, etc., and thus significant differences will arise between the actual and assumed performance. An alternative approach to solving this problem is data-driven: Using training data obtained by sampling the reflected radiation pattern of the RIS for a set of BSWs, a neural network (NN) is designed to create an input-output map between the BSW amplitudes and the resulting sampled radiation pattern. This is the approach discussed in this paper. In the proposed approach, the NN is optimized using a genetic algorithm (GA) to minimize the error between the predicted and measured radiation patterns. The BSW amplitudes are then designed via Simulated Annealing (SA) to optimize a signal-to-leakage-plus-noise ratio measure by iteratively forward-propagating the BSW amplitudes through the NN and using its output as feedback to determine convergence. The resulting optimal solutions are stored in a lookup table to be used both as settings to instantly configure the RIS and as a basis for determining more complex radiation patterns.
comment: 22 pages, 11 figures, 3 tables
OPTIKS: Optimized Gradient Properties Through Timing in K-Space
A customizable method (OPTIKS) for designing fast trajectory-constrained gradient waveforms with optimized time domain properties was developed. Given a specified multidimensional k-space trajectory, the method optimizes traversal speed (and therefore timing) with position along the trajectory. OPTIKS facilitates optimization of objectives dependent on the time domain gradient waveform and the arc-length domain k-space speed. OPTIKS is applied to design waveforms which limit peripheral nerve stimulation (PNS), minimize mechanical resonance excitation, and reduce acoustic noise. A variety of trajectory examples are presented including spirals, circular echo-planar-imaging, and rosettes. Design performance is evaluated based on duration, standardized PNS models, field measurements, gradient coil back-EMF measurements, and calibrated acoustic measurements. We show reductions in back-EMF of up to 94% and field oscillations up to 91.1%, acoustic noise decreases of up to 9.22 dB, and with efficient use of PNS models speed increases of up to 11.4%. The design method implementation is made available as an open source Python package through GitHub.
YANNs: Y-wise Affine Neural Networks for Exact and Efficient Representations of Piecewise Linear Functions
This work formally introduces Y-wise Affine Neural Networks (YANNs), a fully-explainable network architecture that continuously and efficiently represent piecewise affine functions with polytopic subdomains. Following from the proofs, it is shown that the development of YANNs requires no training to achieve the functionally equivalent representation. YANNs thus maintain all mathematical properties of the original formulations. Multi-parametric model predictive control is utilized as an application showcase of YANNs, which theoretically computes optimal control laws as a piecewise affine function of states, outputs, setpoints, and disturbances. With the exact representation of multi-parametric control laws, YANNs retain essential control-theoretic guarantees such as recursive feasibility and stability. This sets YANNs apart from the existing works which apply neural networks for approximating optimal control laws instead of exactly representing them. By optimizing the inference speed of the networks, YANNs can evaluate substantially faster in real-time compared to traditional piecewise affine function calculations. Numerical case studies are presented to demonstrate the algorithmic scalability with respect to the input/output dimensions and the number of subdomains. YANNs represent a significant advancement in control as the first neural network-based controller that inherently ensures both feasibility and stability. Future applications can leverage them as an efficient and interpretable starting point for data-driven modeling/control.
Design and Experimental Test of Datatic Approximate Optimal Filter in Nonlinear Dynamic Systems
Filtering is crucial in engineering fields, providing vital state estimation for control systems. However, the nonlinear nature of complex systems and the presence of non-Gaussian noises pose significant challenges to the performance of conventional filtering methods in terms of estimation accuracy and computational efficiency. In this work, we present a data-driven closed-loop filter, termed datatic approximate optimal filter (DAOF), specifically designed for nonlinear systems under non-Gaussian conditions. We first formulate a Markovian filtering problem (MFP), which inherently shares a connection with reinforcement learning (RL) as it aims to compute the optimal state estimate by minimizing the accumulated error. To solve MFP, we propose DAOF, which primarily incorporates a trained RL policy and features two distinct structural designs: DAOF-v1 and DAOF-v2. Designed for systems with explicit models, DAOF-v1 combines prediction and update phases, with the RL policy generating the update value. Meanwhile, DAOF-v2 bypasses system modeling by directly outputting the state estimate. Then, we utilize an actor-critic algorithm to learn the parameterized policy for DAOF. Experimental results on a 2-degree-of-freedom (2-DOF) vehicle system, equipped with explicit system models, demonstrate the superior accuracy and computational efficiency of DAOF-v1 compared to existing nonlinear filters. Moreover, DAOF-v2 showcases its unique ability to perform filtering without requiring explicit system modeling, as validated by a 14-DOF vehicle system.
Decentralized Stealth Attacks on Cyber-Physical Systems
Decentralized stealth attack constructions that minimize the mutual information between the state variables and the measurements are proposed. The attack constructions are formulated as random Gaussian attacks targeting Cyber-physical systems that aims at minimizing the mutual information between the state variables and measurements while constraining the Kullback-Leibler divergence between the distribution of the measurements under attacks and the distribution of the measurements without attacks. The proposed information metrics adopted measure the disruption and attack detection both globally and locally. The decentralized attack constructions are formulated in a framework of normal games. The global and local information metrics yield games with global and local objectives in disruption and attack detection. We have proven the games are potential games and the convexity of the potential functions followed by the uniqueness and the achievability of the Nash Equilibrium, accordingly. We proposed a best response dynamics to achieve the Nash Equilibrium of the games. We numerically evaluate the performance of the proposed decentralized stealth random attacks on IEEE test systems and show it is feasible to exploit game theoretic techniques in decentralized attack constructions.
comment: 14 pages, 6 figures, 3 pages for appendix. Submitted to IEEE Transactions on Information Forensics and Security
Robust Control of Uncertain Switched Affine Systems via Scenario Optimization
Switched affine systems are often used to model and control complex dynamical systems that operate in multiple modes. However, uncertainties in the system matrices can challenge their stability and performance. This paper introduces a new approach for designing switching control laws for uncertain switched affine systems using data-driven scenario optimization. Instead of relaxing invariant sets, our method creates smaller invariant sets with quadratic Lyapunov functions through scenario-based optimization, effectively reducing chattering effects and regulation error. The framework ensures robustness against parameter uncertainties while improving accuracy. We validate our method with applications in multi-objective interval Markov decision processes and power electronic converters, demonstrating its effectiveness.
Optimal Current Control Strategy for Reliable Power Electronics Converters: Frequency-Domain Approach
Power electronics converters are key enablers in the global energy transition for power generation, industrial and mobility applications; they convert electrical power in a controlled, reliable and efficient manner. The semiconductor switching devices, at the core of power converters, are the most likely component to fail due to the damage caused by the current-induced temperature cycling. Damage models of semiconductors have been developed and employed to study their reliability, improve their design and to estimate the lifetime of the converter in various power applications. However, those models can offer more if employed in the design of strategies to actively operate the converter. Specifically, properly controlling the current, and hence the temperature cycling, can effectively contribute to reducing the accumulated damage in the semiconductor and increase its reliability and lifetime. In this paper we propose a novel current control approach that integrates reliability requirements into the design framework, based on a frequency-domain model of the semiconductor damage.
Time-Modulated EM Skins for Integrated Sensing and Communications
An innovative solution, based on the exploitation of the harmonic beams generated by time-modulated electromagnetic skins (TM-EMSs), is proposed for the implementation of integrated sensing and communication (ISAC) functionalities in a Smart Electromagnetic Environment (SEME) scenario. More in detail, the field radiated by a user terminal, located at an unknown position, is assumed to illuminate a passive TM-EMS that, thanks to a suitable modulation of the local reflection coefficients at the meta-atom level of the EMS surface, simultaneously reflects towards a receiving base station (BS) a "sum" beam and a "difference" one at slightly different frequencies. By processing the received signals and exploiting monopulse radar tracking concepts, the BS both localizes the user terminal and, as a by-product, establishes a communication link with it by leveraging on the "sum" reflected beam. Towards this purpose, the arising harmonic beam control problem is reformulated as a global optimization one, which is successively solved by means of an evolutionary iterative approach to determine the desired TM-EMS modulation sequence. The results from selected numerical and experimental tests are reported to assess the effectiveness and the reliability of the proposed approach.
Realistic Counterfactual Explanations for Machine Learning-Controlled Mobile Robots using 2D LiDAR
This paper presents a novel method for generating realistic counterfactual explanations (CFEs) in machine learning (ML)-based control for mobile robots using 2D LiDAR. ML models, especially artificial neural networks (ANNs), can provide advanced decision-making and control capabilities by learning from data. However, they often function as black boxes, making it challenging to interpret them. This is especially a problem in safety-critical control applications. To generate realistic CFEs, we parameterize the LiDAR space with simple shapes such as circles and rectangles, whose parameters are chosen by a genetic algorithm, and the configurations are transformed into LiDAR data by raycasting. Our model-agnostic approach generates CFEs in the form of synthetic LiDAR data that resembles a base LiDAR state but is modified to produce a pre-defined ML model control output based on a query from the user. We demonstrate our method on a mobile robot, the TurtleBot3, controlled using deep reinforcement learning (DRL) in real-world and simulated scenarios. Our method generates logical and realistic CFEs, which helps to interpret the DRL agent's decision making. This paper contributes towards advancing explainable AI in mobile robotics, and our method could be a tool for understanding, debugging, and improving ML-based autonomous control.
comment: Accepted for publication at the 2025 European Control Conference (ECC)
Nonlinear Model Predictive Control for Leaderless UAV Formation Flying with Collision Avoidance under Directed Graphs
This paper studies the leaderless formation flying problem with collision avoidance for a group of unmanned aerial vehicles (UAVs), which requires the UAVs to navigate through cluttered environments without colliding while maintaining the formation. The communication network among the UAVs is structured as a directed graph that includes a directed spanning tree. A novel distributed nonlinear model predictive control (NMPC) method based on the model reference adaptive consensus (MRACon) framework is proposed. Within this framework, each UAV tracks an assigned reference output generated by a linear reference model that utilizes relative measurements as input. Subsequently, the NMPC method penalizes the tracking error between the output of the reference model and that of the actual model while also establishing constraint sets for collision avoidance and physical limitations to achieve distributed and safe formation control. Finally, simulations and hardware experiments are conducted to verify the effectiveness of the proposed method.
Secure Safety Filter: Towards Safe Flight Control under Sensor Attacks IROS 2025
Modern autopilot systems are prone to sensor attacks that can jeopardize flight safety. To mitigate this risk, we proposed a modular solution: the secure safety filter, which extends the well-established control barrier function (CBF)-based safety filter to account for, and mitigate, sensor attacks. This module consists of a secure state reconstructor (which generates plausible states) and a safety filter (which computes the safe control input that is closest to the nominal one). Differing from existing work focusing on linear, noise-free systems, the proposed secure safety filter handles bounded measurement noise and, by leveraging reduced-order model techniques, is applicable to the nonlinear dynamics of drones. Software-in-the-loop simulations and drone hardware experiments demonstrate the effectiveness of the secure safety filter in rendering the system safe in the presence of sensor attacks.
comment: Submitted to IROS 2025
Secure Safety Filter Design for Sampled-data Nonlinear Systems under Sensor Spoofing Attacks
This paper presents a secure safety filter design for nonlinear systems under sensor spoofing attacks. Existing approaches primarily focus on linear systems which limits their applications in real-world scenarios. In this work, we extend these results to nonlinear systems in a principled way. We introduce exact observability maps that abstract specific state estimation algorithms and extend them to a secure version capable of handling sensor attacks. Our generalization also applies to the relaxed observability case, with slightly relaxed guarantees. More importantly, we propose a secure safety filter design in both exact and relaxed cases, which incorporates secure state estimation and a control barrier function-enabled safety filter. The proposed approach provides theoretical safety guarantees for nonlinear systems in the presence of sensor attacks. We numerically validate our analysis on a unicycle vehicle equipped with redundant yet partly compromised sensors.
comment: Submitted to IEEE CDC 2025
Dynamic Safety in Complex Environments: Synthesizing Safety Filters with Poisson's Equation
Synthesizing safe sets for robotic systems operating in complex and dynamically changing environments is a challenging problem. Solving this problem can enable the construction of safety filters that guarantee safe control actions -- most notably by employing Control Barrier Functions (CBFs). This paper presents an algorithm for generating safe sets from perception data by leveraging elliptic partial differential equations, specifically Poisson's equation. Given a local occupancy map, we solve Poisson's equation subject to Dirichlet boundary conditions, with a novel forcing function. Specifically, we design a smooth guidance vector field, which encodes gradient information required for safety. The result is a variational problem for which the unique minimizer -- a safety function -- characterizes the safe set. After establishing our theoretical result, we illustrate how safety functions can be used in CBF-based safety filtering. The real-time utility of our synthesis method is highlighted through hardware demonstrations on quadruped and humanoid robots navigating dynamically changing obstacle-filled environments.
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Stochastic approximation (SA) is an iterative algorithm for finding the fixed point of an operator using noisy samples and widely used in optimization and Reinforcement Learning (RL). The noise in RL exhibits a Markovian structure, and in some cases, such as gradient temporal difference (GTD) methods, SA is employed in a two-time-scale framework. This combination introduces significant theoretical challenges for analysis. We derive an upper bound on the error for the iterations of linear two-time-scale SA with Markovian noise. We demonstrate that the mean squared error decreases as $trace (\Sigma^y)/k + o(1/k)$ where $k$ is the number of iterates, and $\Sigma^y$ is an appropriately defined covariance matrix. A key feature of our bounds is that the leading term, $\Sigma^y$, exactly matches with the covariance in the Central Limit Theorem (CLT) for the two-time-scale SA, and we call them tight finite-time bounds. We illustrate their use in RL by establishing sample complexity for off-policy algorithms, TDC, GTD, and GTD2. A special case of linear two-time-scale SA that is extensively studied is linear SA with Polyak-Ruppert averaging. We present tight finite time bounds corresponding to the covariance matrix of the CLT. Such bounds can be used to study TD-learning with Polyak-Ruppert averaging.
comment: 83 pages, 6 figures
Near-optimal Sensor Placement for Detecting Stochastic Target Trajectories in Barrier Coverage Systems
This paper addresses the deployment of sensors for a 2-D barrier coverage system. The challenge is to compute near-optimal sensor placements for detecting targets whose trajectories follow a log-Gaussian Cox line process. We explore sensor deployment in a transformed space, where linear target trajectories are represented as points. While this space simplifies handling the line process, the spatial functions representing sensor performance (i.e. probability of detection) become less intuitive. To illustrate our approach, we focus on positioning sensors of the barrier coverage system on the seafloor to detect passing ships. Through numerical experiments using historical ship data, we compute sensor locations that maximize the probability all ship passing over the barrier coverage system are detected.
comment: This work is published in IEEE SysCon 2025
Neural Predictor for Flight Control with Payload
Aerial robotics for transporting suspended payloads as the form of freely-floating manipulator are growing great interest in recent years. However, the force/torque caused by payload and residual dynamics will introduce unmodeled perturbations to the aerial robotics, which negatively affects the closed-loop performance. Different from estimation-like methods, this paper proposes Neural Predictor, a learning-based approach to model force/torque induced by payload and residual dynamics as a dynamical system. It yields a hybrid model that combines the first-principles dynamics with the learned dynamics. The hybrid model is then integrated into a MPC framework to improve closed-loop performance. Effectiveness of proposed framework is verified extensively in both numerical simulations and real-world flight experiments. The results indicate that our approach can capture force/torque caused by suspended payload and residual dynamics accurately, respond quickly to the changes of them and improve the closed-loop performance significantly. In particular, Neural Predictor outperforms a state-of-the-art learning-based estimator and has reduced the force and torque estimation errors by up to 66.15% and 33.33% while requiring less samples. The code of proposed Neural Predictor can be found at https://github.com/NPU-RCIR/Neural-Predictor.git.
comment: This paper (longer version) has been accepted in RA-L
Exploring Gen-AI applications in building research and industry: A review
This paper investigates the transformative potential of Generative AI (Gen-AI) technologies, particularly large language models, within the building industry. By leveraging these advanced AI tools, the study explores their application across key areas such as automated compliance checking and building design assistance. The research highlights how Gen-AI can automate labor-intensive processes, significantly improving efficiency and reducing costs in building practices. The paper first discusses the two widely applied fundamental models-Transformer and Diffusion model-and summarizes current pathways for accessing Gen-AI models and the most common techniques for customizing them. It then explores applications for text generation, such as compliance checking, control support, data mining, and building simulation input file editing. Additionally, it examines image generation, including direct generation through diffusion models and indirect generation through language model-supported template creation based on existing Computer-Aided Design or other design tools with rendering. The paper concludes with a comprehensive analysis of the current capabilities of Gen-AI in the building industry, outlining future directions for research and development, with the goal of paving the way for smarter, more effective, and responsive design, construction, and operational practices.
comment: This is a pre-peer review and copy editing version of an article published in Building Simulation. The final authenticated version is available online at:https://doi.org/10.1007/s12273-025-1279-x
Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC
Extreme cornering in racing often leads to large sideslip angles, presenting a significant challenge for vehicle control. Conventional vehicle controllers struggle to manage this scenario, necessitating the use of a drifting controller. However, the large sideslip angle in drift conditions introduces model mismatch, which in turn affects control precision. To address this issue, we propose a model correction drift controller that integrates Model Predictive Control (MPC) with Gaussian Process Regression (GPR). GPR is employed to correct vehicle model mismatches during both drift equilibrium solving and the MPC optimization process. Additionally, the variance from GPR is utilized to actively explore different cornering drifting velocities, aiming to minimize trajectory tracking errors. The proposed algorithm is validated through simulations on the Simulink-Carsim platform and experiments with a 1:10 scale RC vehicle. In the simulation, the average lateral error with GPR is reduced by 52.8% compared to the non-GPR case. Incorporating exploration further decreases this error by 27.1%. The velocity tracking Root Mean Square Error (RMSE) also decreases by 10.6% with exploration. In the RC car experiment, the average lateral error with GPR is 36.7% lower, and exploration further leads to a 29.0% reduction. Moreover, the velocity tracking RMSE decreases by 7.2% with the inclusion of exploration.
Automated Fuzzing of Automotive Control Units
Modern vehicles are governed by a network of Electronic Control Units (ECUs), which are programmed to sense inputs from the driver and the environment, to process these inputs, and to control actuators that, e.g., regulate the engine or even control the steering system. ECUs within a vehicle communicate via automotive bus systems such as the Controller Area Network (CAN), and beyond the vehicles boundaries through upcoming vehicle-to-vehicle and vehicle-to-infrastructure channels. Approaches to manipulate the communication between ECUs for the purpose of security testing and reverse-engineering of vehicular functions have been presented in the past, all of which struggle with automating the detection of system change in response to message injection. In this paper we present our findings with fuzzing CAN networks, in particular while observing individual ECUs with a sensor harness. The harness detects physical responses, which we then use in a oracle functions to inform the fuzzing process. We systematically define fuzzers, fuzzing configurations and oracle functions for testing ECUs. We evaluate our approach based on case studies of commercial instrument clusters and with an experimental framework for CAN authentication. Our results show that the approach is capable of identifying interesting ECU states with a high level of automation. Our approach is applicable in distributed cyber-physical systems beyond automotive computing.
comment: Appeared in 2019 International Workshop on Attacks and Defenses for Internet-of-Things (ADIoT) / International Workshop on the Secure Internet of Things (SIoT)
Systems and Control (EESS)
AI-Driven Optimization of Wave-Controlled Reconfigurable Intelligent Surfaces
A promising type of Reconfigurable Intelligent Surface (RIS) employs tunable control of its varactors using biasing transmission lines below the RIS reflecting elements. Biasing standing waves (BSWs) are excited by a time-periodic signal and sampled at each RIS element to create a desired biasing voltage and control the reflection coefficients of the elements. A simple rectifier can be used to sample the voltages and capture the peaks of the BSWs over time. Like other types of RIS, attempting to model and accurately configure a wave-controlled RIS is extremely challenging due to factors such as device non-linearities, frequency dependence, element coupling, etc., and thus significant differences will arise between the actual and assumed performance. An alternative approach to solving this problem is data-driven: Using training data obtained by sampling the reflected radiation pattern of the RIS for a set of BSWs, a neural network (NN) is designed to create an input-output map between the BSW amplitudes and the resulting sampled radiation pattern. This is the approach discussed in this paper. In the proposed approach, the NN is optimized using a genetic algorithm (GA) to minimize the error between the predicted and measured radiation patterns. The BSW amplitudes are then designed via Simulated Annealing (SA) to optimize a signal-to-leakage-plus-noise ratio measure by iteratively forward-propagating the BSW amplitudes through the NN and using its output as feedback to determine convergence. The resulting optimal solutions are stored in a lookup table to be used both as settings to instantly configure the RIS and as a basis for determining more complex radiation patterns.
comment: 22 pages, 11 figures, 3 tables
OPTIKS: Optimized Gradient Properties Through Timing in K-Space
A customizable method (OPTIKS) for designing fast trajectory-constrained gradient waveforms with optimized time domain properties was developed. Given a specified multidimensional k-space trajectory, the method optimizes traversal speed (and therefore timing) with position along the trajectory. OPTIKS facilitates optimization of objectives dependent on the time domain gradient waveform and the arc-length domain k-space speed. OPTIKS is applied to design waveforms which limit peripheral nerve stimulation (PNS), minimize mechanical resonance excitation, and reduce acoustic noise. A variety of trajectory examples are presented including spirals, circular echo-planar-imaging, and rosettes. Design performance is evaluated based on duration, standardized PNS models, field measurements, gradient coil back-EMF measurements, and calibrated acoustic measurements. We show reductions in back-EMF of up to 94% and field oscillations up to 91.1%, acoustic noise decreases of up to 9.22 dB, and with efficient use of PNS models speed increases of up to 11.4%. The design method implementation is made available as an open source Python package through GitHub.
YANNs: Y-wise Affine Neural Networks for Exact and Efficient Representations of Piecewise Linear Functions
This work formally introduces Y-wise Affine Neural Networks (YANNs), a fully-explainable network architecture that continuously and efficiently represent piecewise affine functions with polytopic subdomains. Following from the proofs, it is shown that the development of YANNs requires no training to achieve the functionally equivalent representation. YANNs thus maintain all mathematical properties of the original formulations. Multi-parametric model predictive control is utilized as an application showcase of YANNs, which theoretically computes optimal control laws as a piecewise affine function of states, outputs, setpoints, and disturbances. With the exact representation of multi-parametric control laws, YANNs retain essential control-theoretic guarantees such as recursive feasibility and stability. This sets YANNs apart from the existing works which apply neural networks for approximating optimal control laws instead of exactly representing them. By optimizing the inference speed of the networks, YANNs can evaluate substantially faster in real-time compared to traditional piecewise affine function calculations. Numerical case studies are presented to demonstrate the algorithmic scalability with respect to the input/output dimensions and the number of subdomains. YANNs represent a significant advancement in control as the first neural network-based controller that inherently ensures both feasibility and stability. Future applications can leverage them as an efficient and interpretable starting point for data-driven modeling/control.
Design and Experimental Test of Datatic Approximate Optimal Filter in Nonlinear Dynamic Systems
Filtering is crucial in engineering fields, providing vital state estimation for control systems. However, the nonlinear nature of complex systems and the presence of non-Gaussian noises pose significant challenges to the performance of conventional filtering methods in terms of estimation accuracy and computational efficiency. In this work, we present a data-driven closed-loop filter, termed datatic approximate optimal filter (DAOF), specifically designed for nonlinear systems under non-Gaussian conditions. We first formulate a Markovian filtering problem (MFP), which inherently shares a connection with reinforcement learning (RL) as it aims to compute the optimal state estimate by minimizing the accumulated error. To solve MFP, we propose DAOF, which primarily incorporates a trained RL policy and features two distinct structural designs: DAOF-v1 and DAOF-v2. Designed for systems with explicit models, DAOF-v1 combines prediction and update phases, with the RL policy generating the update value. Meanwhile, DAOF-v2 bypasses system modeling by directly outputting the state estimate. Then, we utilize an actor-critic algorithm to learn the parameterized policy for DAOF. Experimental results on a 2-degree-of-freedom (2-DOF) vehicle system, equipped with explicit system models, demonstrate the superior accuracy and computational efficiency of DAOF-v1 compared to existing nonlinear filters. Moreover, DAOF-v2 showcases its unique ability to perform filtering without requiring explicit system modeling, as validated by a 14-DOF vehicle system.
Decentralized Stealth Attacks on Cyber-Physical Systems
Decentralized stealth attack constructions that minimize the mutual information between the state variables and the measurements are proposed. The attack constructions are formulated as random Gaussian attacks targeting Cyber-physical systems that aims at minimizing the mutual information between the state variables and measurements while constraining the Kullback-Leibler divergence between the distribution of the measurements under attacks and the distribution of the measurements without attacks. The proposed information metrics adopted measure the disruption and attack detection both globally and locally. The decentralized attack constructions are formulated in a framework of normal games. The global and local information metrics yield games with global and local objectives in disruption and attack detection. We have proven the games are potential games and the convexity of the potential functions followed by the uniqueness and the achievability of the Nash Equilibrium, accordingly. We proposed a best response dynamics to achieve the Nash Equilibrium of the games. We numerically evaluate the performance of the proposed decentralized stealth random attacks on IEEE test systems and show it is feasible to exploit game theoretic techniques in decentralized attack constructions.
comment: 14 pages, 6 figures, 3 pages for appendix. Submitted to IEEE Transactions on Information Forensics and Security
Robust Control of Uncertain Switched Affine Systems via Scenario Optimization
Switched affine systems are often used to model and control complex dynamical systems that operate in multiple modes. However, uncertainties in the system matrices can challenge their stability and performance. This paper introduces a new approach for designing switching control laws for uncertain switched affine systems using data-driven scenario optimization. Instead of relaxing invariant sets, our method creates smaller invariant sets with quadratic Lyapunov functions through scenario-based optimization, effectively reducing chattering effects and regulation error. The framework ensures robustness against parameter uncertainties while improving accuracy. We validate our method with applications in multi-objective interval Markov decision processes and power electronic converters, demonstrating its effectiveness.
Optimal Current Control Strategy for Reliable Power Electronics Converters: Frequency-Domain Approach
Power electronics converters are key enablers in the global energy transition for power generation, industrial and mobility applications; they convert electrical power in a controlled, reliable and efficient manner. The semiconductor switching devices, at the core of power converters, are the most likely component to fail due to the damage caused by the current-induced temperature cycling. Damage models of semiconductors have been developed and employed to study their reliability, improve their design and to estimate the lifetime of the converter in various power applications. However, those models can offer more if employed in the design of strategies to actively operate the converter. Specifically, properly controlling the current, and hence the temperature cycling, can effectively contribute to reducing the accumulated damage in the semiconductor and increase its reliability and lifetime. In this paper we propose a novel current control approach that integrates reliability requirements into the design framework, based on a frequency-domain model of the semiconductor damage.
Time-Modulated EM Skins for Integrated Sensing and Communications
An innovative solution, based on the exploitation of the harmonic beams generated by time-modulated electromagnetic skins (TM-EMSs), is proposed for the implementation of integrated sensing and communication (ISAC) functionalities in a Smart Electromagnetic Environment (SEME) scenario. More in detail, the field radiated by a user terminal, located at an unknown position, is assumed to illuminate a passive TM-EMS that, thanks to a suitable modulation of the local reflection coefficients at the meta-atom level of the EMS surface, simultaneously reflects towards a receiving base station (BS) a "sum" beam and a "difference" one at slightly different frequencies. By processing the received signals and exploiting monopulse radar tracking concepts, the BS both localizes the user terminal and, as a by-product, establishes a communication link with it by leveraging on the "sum" reflected beam. Towards this purpose, the arising harmonic beam control problem is reformulated as a global optimization one, which is successively solved by means of an evolutionary iterative approach to determine the desired TM-EMS modulation sequence. The results from selected numerical and experimental tests are reported to assess the effectiveness and the reliability of the proposed approach.
Realistic Counterfactual Explanations for Machine Learning-Controlled Mobile Robots using 2D LiDAR
This paper presents a novel method for generating realistic counterfactual explanations (CFEs) in machine learning (ML)-based control for mobile robots using 2D LiDAR. ML models, especially artificial neural networks (ANNs), can provide advanced decision-making and control capabilities by learning from data. However, they often function as black boxes, making it challenging to interpret them. This is especially a problem in safety-critical control applications. To generate realistic CFEs, we parameterize the LiDAR space with simple shapes such as circles and rectangles, whose parameters are chosen by a genetic algorithm, and the configurations are transformed into LiDAR data by raycasting. Our model-agnostic approach generates CFEs in the form of synthetic LiDAR data that resembles a base LiDAR state but is modified to produce a pre-defined ML model control output based on a query from the user. We demonstrate our method on a mobile robot, the TurtleBot3, controlled using deep reinforcement learning (DRL) in real-world and simulated scenarios. Our method generates logical and realistic CFEs, which helps to interpret the DRL agent's decision making. This paper contributes towards advancing explainable AI in mobile robotics, and our method could be a tool for understanding, debugging, and improving ML-based autonomous control.
comment: Accepted for publication at the 2025 European Control Conference (ECC)
Nonlinear Model Predictive Control for Leaderless UAV Formation Flying with Collision Avoidance under Directed Graphs
This paper studies the leaderless formation flying problem with collision avoidance for a group of unmanned aerial vehicles (UAVs), which requires the UAVs to navigate through cluttered environments without colliding while maintaining the formation. The communication network among the UAVs is structured as a directed graph that includes a directed spanning tree. A novel distributed nonlinear model predictive control (NMPC) method based on the model reference adaptive consensus (MRACon) framework is proposed. Within this framework, each UAV tracks an assigned reference output generated by a linear reference model that utilizes relative measurements as input. Subsequently, the NMPC method penalizes the tracking error between the output of the reference model and that of the actual model while also establishing constraint sets for collision avoidance and physical limitations to achieve distributed and safe formation control. Finally, simulations and hardware experiments are conducted to verify the effectiveness of the proposed method.
Secure Safety Filter: Towards Safe Flight Control under Sensor Attacks IROS 2025
Modern autopilot systems are prone to sensor attacks that can jeopardize flight safety. To mitigate this risk, we proposed a modular solution: the secure safety filter, which extends the well-established control barrier function (CBF)-based safety filter to account for, and mitigate, sensor attacks. This module consists of a secure state reconstructor (which generates plausible states) and a safety filter (which computes the safe control input that is closest to the nominal one). Differing from existing work focusing on linear, noise-free systems, the proposed secure safety filter handles bounded measurement noise and, by leveraging reduced-order model techniques, is applicable to the nonlinear dynamics of drones. Software-in-the-loop simulations and drone hardware experiments demonstrate the effectiveness of the secure safety filter in rendering the system safe in the presence of sensor attacks.
comment: Submitted to IROS 2025
Secure Safety Filter Design for Sampled-data Nonlinear Systems under Sensor Spoofing Attacks
This paper presents a secure safety filter design for nonlinear systems under sensor spoofing attacks. Existing approaches primarily focus on linear systems which limits their applications in real-world scenarios. In this work, we extend these results to nonlinear systems in a principled way. We introduce exact observability maps that abstract specific state estimation algorithms and extend them to a secure version capable of handling sensor attacks. Our generalization also applies to the relaxed observability case, with slightly relaxed guarantees. More importantly, we propose a secure safety filter design in both exact and relaxed cases, which incorporates secure state estimation and a control barrier function-enabled safety filter. The proposed approach provides theoretical safety guarantees for nonlinear systems in the presence of sensor attacks. We numerically validate our analysis on a unicycle vehicle equipped with redundant yet partly compromised sensors.
comment: Submitted to IEEE CDC 2025
Dynamic Safety in Complex Environments: Synthesizing Safety Filters with Poisson's Equation
Synthesizing safe sets for robotic systems operating in complex and dynamically changing environments is a challenging problem. Solving this problem can enable the construction of safety filters that guarantee safe control actions -- most notably by employing Control Barrier Functions (CBFs). This paper presents an algorithm for generating safe sets from perception data by leveraging elliptic partial differential equations, specifically Poisson's equation. Given a local occupancy map, we solve Poisson's equation subject to Dirichlet boundary conditions, with a novel forcing function. Specifically, we design a smooth guidance vector field, which encodes gradient information required for safety. The result is a variational problem for which the unique minimizer -- a safety function -- characterizes the safe set. After establishing our theoretical result, we illustrate how safety functions can be used in CBF-based safety filtering. The real-time utility of our synthesis method is highlighted through hardware demonstrations on quadruped and humanoid robots navigating dynamically changing obstacle-filled environments.
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Stochastic approximation (SA) is an iterative algorithm for finding the fixed point of an operator using noisy samples and widely used in optimization and Reinforcement Learning (RL). The noise in RL exhibits a Markovian structure, and in some cases, such as gradient temporal difference (GTD) methods, SA is employed in a two-time-scale framework. This combination introduces significant theoretical challenges for analysis. We derive an upper bound on the error for the iterations of linear two-time-scale SA with Markovian noise. We demonstrate that the mean squared error decreases as $trace (\Sigma^y)/k + o(1/k)$ where $k$ is the number of iterates, and $\Sigma^y$ is an appropriately defined covariance matrix. A key feature of our bounds is that the leading term, $\Sigma^y$, exactly matches with the covariance in the Central Limit Theorem (CLT) for the two-time-scale SA, and we call them tight finite-time bounds. We illustrate their use in RL by establishing sample complexity for off-policy algorithms, TDC, GTD, and GTD2. A special case of linear two-time-scale SA that is extensively studied is linear SA with Polyak-Ruppert averaging. We present tight finite time bounds corresponding to the covariance matrix of the CLT. Such bounds can be used to study TD-learning with Polyak-Ruppert averaging.
comment: 83 pages, 6 figures
Near-optimal Sensor Placement for Detecting Stochastic Target Trajectories in Barrier Coverage Systems
This paper addresses the deployment of sensors for a 2-D barrier coverage system. The challenge is to compute near-optimal sensor placements for detecting targets whose trajectories follow a log-Gaussian Cox line process. We explore sensor deployment in a transformed space, where linear target trajectories are represented as points. While this space simplifies handling the line process, the spatial functions representing sensor performance (i.e. probability of detection) become less intuitive. To illustrate our approach, we focus on positioning sensors of the barrier coverage system on the seafloor to detect passing ships. Through numerical experiments using historical ship data, we compute sensor locations that maximize the probability all ship passing over the barrier coverage system are detected.
comment: This work is published in IEEE SysCon 2025
Neural Predictor for Flight Control with Payload
Aerial robotics for transporting suspended payloads as the form of freely-floating manipulator are growing great interest in recent years. However, the force/torque caused by payload and residual dynamics will introduce unmodeled perturbations to the aerial robotics, which negatively affects the closed-loop performance. Different from estimation-like methods, this paper proposes Neural Predictor, a learning-based approach to model force/torque induced by payload and residual dynamics as a dynamical system. It yields a hybrid model that combines the first-principles dynamics with the learned dynamics. The hybrid model is then integrated into a MPC framework to improve closed-loop performance. Effectiveness of proposed framework is verified extensively in both numerical simulations and real-world flight experiments. The results indicate that our approach can capture force/torque caused by suspended payload and residual dynamics accurately, respond quickly to the changes of them and improve the closed-loop performance significantly. In particular, Neural Predictor outperforms a state-of-the-art learning-based estimator and has reduced the force and torque estimation errors by up to 66.15% and 33.33% while requiring less samples. The code of proposed Neural Predictor can be found at https://github.com/NPU-RCIR/Neural-Predictor.git.
comment: This paper (longer version) has been accepted in RA-L
Exploring Gen-AI applications in building research and industry: A review
This paper investigates the transformative potential of Generative AI (Gen-AI) technologies, particularly large language models, within the building industry. By leveraging these advanced AI tools, the study explores their application across key areas such as automated compliance checking and building design assistance. The research highlights how Gen-AI can automate labor-intensive processes, significantly improving efficiency and reducing costs in building practices. The paper first discusses the two widely applied fundamental models-Transformer and Diffusion model-and summarizes current pathways for accessing Gen-AI models and the most common techniques for customizing them. It then explores applications for text generation, such as compliance checking, control support, data mining, and building simulation input file editing. Additionally, it examines image generation, including direct generation through diffusion models and indirect generation through language model-supported template creation based on existing Computer-Aided Design or other design tools with rendering. The paper concludes with a comprehensive analysis of the current capabilities of Gen-AI in the building industry, outlining future directions for research and development, with the goal of paving the way for smarter, more effective, and responsive design, construction, and operational practices.
comment: This is a pre-peer review and copy editing version of an article published in Building Simulation. The final authenticated version is available online at:https://doi.org/10.1007/s12273-025-1279-x
Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC
Extreme cornering in racing often leads to large sideslip angles, presenting a significant challenge for vehicle control. Conventional vehicle controllers struggle to manage this scenario, necessitating the use of a drifting controller. However, the large sideslip angle in drift conditions introduces model mismatch, which in turn affects control precision. To address this issue, we propose a model correction drift controller that integrates Model Predictive Control (MPC) with Gaussian Process Regression (GPR). GPR is employed to correct vehicle model mismatches during both drift equilibrium solving and the MPC optimization process. Additionally, the variance from GPR is utilized to actively explore different cornering drifting velocities, aiming to minimize trajectory tracking errors. The proposed algorithm is validated through simulations on the Simulink-Carsim platform and experiments with a 1:10 scale RC vehicle. In the simulation, the average lateral error with GPR is reduced by 52.8% compared to the non-GPR case. Incorporating exploration further decreases this error by 27.1%. The velocity tracking Root Mean Square Error (RMSE) also decreases by 10.6% with exploration. In the RC car experiment, the average lateral error with GPR is 36.7% lower, and exploration further leads to a 29.0% reduction. Moreover, the velocity tracking RMSE decreases by 7.2% with the inclusion of exploration.
Automated Fuzzing of Automotive Control Units
Modern vehicles are governed by a network of Electronic Control Units (ECUs), which are programmed to sense inputs from the driver and the environment, to process these inputs, and to control actuators that, e.g., regulate the engine or even control the steering system. ECUs within a vehicle communicate via automotive bus systems such as the Controller Area Network (CAN), and beyond the vehicles boundaries through upcoming vehicle-to-vehicle and vehicle-to-infrastructure channels. Approaches to manipulate the communication between ECUs for the purpose of security testing and reverse-engineering of vehicular functions have been presented in the past, all of which struggle with automating the detection of system change in response to message injection. In this paper we present our findings with fuzzing CAN networks, in particular while observing individual ECUs with a sensor harness. The harness detects physical responses, which we then use in a oracle functions to inform the fuzzing process. We systematically define fuzzers, fuzzing configurations and oracle functions for testing ECUs. We evaluate our approach based on case studies of commercial instrument clusters and with an experimental framework for CAN authentication. Our results show that the approach is capable of identifying interesting ECU states with a high level of automation. Our approach is applicable in distributed cyber-physical systems beyond automotive computing.
comment: Appeared in 2019 International Workshop on Attacks and Defenses for Internet-of-Things (ADIoT) / International Workshop on the Secure Internet of Things (SIoT)
Robotics
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and often researchers lack availability to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving to semi-fullscale validation with the R/V milliAmpere 1 passenger ferry and full-scale validation using the R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC) including autonomy.
FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation
Humanoid loco-manipulation holds transformative potential for daily service and industrial tasks, yet achieving precise, robust whole-body control with 3D end-effector force interaction remains a major challenge. Prior approaches are often limited to lightweight tasks or quadrupedal/wheeled platforms. To overcome these limitations, we propose FALCON, a dual-agent reinforcement-learning-based framework for robust force-adaptive humanoid loco-manipulation. FALCON decomposes whole-body control into two specialized agents: (1) a lower-body agent ensuring stable locomotion under external force disturbances, and (2) an upper-body agent precisely tracking end-effector positions with implicit adaptive force compensation. These two agents are jointly trained in simulation with a force curriculum that progressively escalates the magnitude of external force exerted on the end effector while respecting torque limits. Experiments demonstrate that, compared to the baselines, FALCON achieves 2x more accurate upper-body joint tracking, while maintaining robust locomotion under force disturbances and achieving faster training convergence. Moreover, FALCON enables policy training without embodiment-specific reward or curriculum tuning. Using the same training setup, we obtain policies that are deployed across multiple humanoids, enabling forceful loco-manipulation tasks such as transporting payloads (0-20N force), cart-pulling (0-100N), and door-opening (0-40N) in the real world.
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes
Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot reinforcement learning (MRRL) policies with realistic robot dynamics and safety constraints, supporting both parallelization and hardware acceleration. Our generalizable learning interface provides an easy-to-use integration with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a realistic robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation.
comment: 22 pages, 14 figures, 10 tables
Investigating Robotaxi Crash Severity Using Geographical Random Forest
This paper quantitatively investigates the crash severity of Autonomous Vehicles (AVs) with spatially localized machine learning and macroscopic measures of the urban built environment. We address spatial heterogeneity and spatial autocorrelation, while focusing on land use patterns and human behavior. Our Geographical Random Forest (GRF) model, accompanied with a crash severity risk map of San Francisco, presents three findings that are useful for commercial operations of AVs and robotaxis. First, spatially localized machine learning performed better than regular machine learning, when predicting AV crash severity. Bias-variance tradeoff was evident as we adjust the localization weight hyperparameter. Second, land use was the most important built environment measure, compared to intersections, building footprints, public transit stops, and Points Of Interests (POIs). Third, it was predicted that city center areas with greater diversity and commercial activities were more likely to result in low-severity AV crashes, than residential neighborhoods. Residential land use may be associated with higher severity due to human behavior and less restrictive environment. This paper recommends to explicitly consider geographic locations, and to design safety measures specific to residential neighborhoods, when robotaxi operators train their AV systems.
comment: 21 pages, 8 figures
Learned IMU Bias Prediction for Invariant Visual Inertial Odometry
Autonomous mobile robots operating in novel environments depend critically on accurate state estimation, often utilizing visual and inertial measurements. Recent work has shown that an invariant formulation of the extended Kalman filter improves the convergence and robustness of visual-inertial odometry by utilizing the Lie group structure of a robot's position, velocity, and orientation states. However, inertial sensors also require measurement bias estimation, yet introducing the bias in the filter state breaks the Lie group symmetry. In this paper, we design a neural network to predict the bias of an inertial measurement unit (IMU) from a sequence of previous IMU measurements. This allows us to use an invariant filter for visual inertial odometry, relying on the learned bias prediction rather than introducing the bias in the filter state. We demonstrate that an invariant multi-state constraint Kalman filter (MSCKF) with learned bias predictions achieves robust visual-inertial odometry in real experiments, even when visual information is unavailable for extended periods and the system needs to rely solely on IMU measurements.
comment: This work has been submitted to the IEEE for possible publication
M3CAD: Towards Generic Cooperative Autonomous Driving Benchmark
We introduce M$^3$CAD, a novel benchmark designed to advance research in generic cooperative autonomous driving. M$^3$CAD comprises 204 sequences with 30k frames, spanning a diverse range of cooperative driving scenarios. Each sequence includes multiple vehicles and sensing modalities, e.g., LiDAR point clouds, RGB images, and GPS/IMU, supporting a variety of autonomous driving tasks, including object detection and tracking, mapping, motion forecasting, occupancy prediction, and path planning. This rich multimodal setup enables M$^3$CAD to support both single-vehicle and multi-vehicle autonomous driving research, significantly broadening the scope of research in the field. To our knowledge, M$^3$CAD is the most comprehensive benchmark specifically tailored for cooperative multi-task autonomous driving research. We evaluate the state-of-the-art end-to-end solution on M$^3$CAD to establish baseline performance. To foster cooperative autonomous driving research, we also propose E2EC, a simple yet effective framework for cooperative driving solution that leverages inter-vehicle shared information for improved path planning. We release M$^3$CAD, along with our baseline models and evaluation results, to support the development of robust cooperative autonomous driving systems. All resources will be made publicly available on https://github.com/zhumorui/M3CAD
comment: supplementary material included
TPK: Trustworthy Trajectory Prediction Integrating Prior Knowledge For Interpretability and Kinematic Feasibility
Trajectory prediction is crucial for autonomous driving, enabling vehicles to navigate safely by anticipating the movements of surrounding road users. However, current deep learning models often lack trustworthiness as their predictions can be physically infeasible and illogical to humans. To make predictions more trustworthy, recent research has incorporated prior knowledge, like the social force model for modeling interactions and kinematic models for physical realism. However, these approaches focus on priors that suit either vehicles or pedestrians and do not generalize to traffic with mixed agent classes. We propose incorporating interaction and kinematic priors of all agent classes--vehicles, pedestrians, and cyclists with class-specific interaction layers to capture agent behavioral differences. To improve the interpretability of the agent interactions, we introduce DG-SFM, a rule-based interaction importance score that guides the interaction layer. To ensure physically feasible predictions, we proposed suitable kinematic models for all agent classes with a novel pedestrian kinematic model. We benchmark our approach on the Argoverse 2 dataset, using the state-of-the-art transformer HPTR as our baseline. Experiments demonstrate that our method improves interaction interpretability, revealing a correlation between incorrect predictions and divergence from our interaction prior. Even though incorporating the kinematic models causes a slight decrease in accuracy, they eliminate infeasible trajectories found in the dataset and the baseline model. Thus, our approach fosters trust in trajectory prediction as its interaction reasoning is interpretable, and its predictions adhere to physics.
comment: Accepted in the 36th IEEE Intelligent Vehicles Symposium (IV 2025) for oral presentation
Boundary-Guided Trajectory Prediction for Road Aware and Physically Feasible Autonomous Driving
Accurate prediction of surrounding road users' trajectories is essential for safe and efficient autonomous driving. While deep learning models have improved performance, challenges remain in preventing off-road predictions and ensuring kinematic feasibility. Existing methods incorporate road-awareness modules and enforce kinematic constraints but lack plausibility guarantees and often introduce trade-offs in complexity and flexibility. This paper proposes a novel framework that formulates trajectory prediction as a constrained regression guided by permissible driving directions and their boundaries. Using the agent's current state and an HD map, our approach defines the valid boundaries and ensures on-road predictions by training the network to learn superimposed paths between left and right boundary polylines. To guarantee feasibility, the model predicts acceleration profiles that determine the vehicle's travel distance along these paths while adhering to kinematic constraints. We evaluate our approach on the Argoverse-2 dataset against the HPTR baseline. Our approach shows a slight decrease in benchmark metrics compared to HPTR but notably improves final displacement error and eliminates infeasible trajectories. Moreover, the proposed approach has superior generalization to less prevalent maneuvers and unseen out-of-distribution scenarios, reducing the off-road rate under adversarial attacks from 66\% to just 1\%. These results highlight the effectiveness of our approach in generating feasible and robust predictions.
comment: Accepted in the 36th IEEE Intelligent Vehicles Symposium (IV 2025)
Balancing Progress and Safety: A Novel Risk-Aware Objective for RL in Autonomous Driving
Reinforcement Learning (RL) is a promising approach for achieving autonomous driving due to robust decision-making capabilities. RL learns a driving policy through trial and error in traffic scenarios, guided by a reward function that combines the driving objectives. The design of such reward function has received insufficient attention, yielding ill-defined rewards with various pitfalls. Safety, in particular, has long been regarded only as a penalty for collisions. This leaves the risks associated with actions leading up to a collision unaddressed, limiting the applicability of RL in real-world scenarios. To address these shortcomings, our work focuses on enhancing the reward formulation by defining a set of driving objectives and structuring them hierarchically. Furthermore, we discuss the formulation of these objectives in a normalized manner to transparently determine their contribution to the overall reward. Additionally, we introduce a novel risk-aware objective for various driving interactions based on a two-dimensional ellipsoid function and an extension of Responsibility-Sensitive Safety (RSS) concepts. We evaluate the efficacy of our proposed reward in unsignalized intersection scenarios with varying traffic densities. The approach decreases collision rates by 21\% on average compared to baseline rewards and consistently surpasses them in route progress and cumulative reward, demonstrating its capability to promote safer driving behaviors while maintaining high-performance levels.
comment: Accepted in the 36th IEEE Intelligent vehicles Symposium (IV 2025)
STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation
Vision-Language Models (VLMs) have been increasingly integrated into object navigation tasks for their rich prior knowledge and strong reasoning abilities. However, applying VLMs to navigation poses two key challenges: effectively representing complex environment information and determining \textit{when and how} to query VLMs. Insufficient environment understanding and over-reliance on VLMs (e.g. querying at every step) can lead to unnecessary backtracking and reduced navigation efficiency, especially in continuous environments. To address these challenges, we propose a novel framework that constructs a multi-layer representation of the environment during navigation. This representation consists of viewpoint, object nodes, and room nodes. Viewpoints and object nodes facilitate intra-room exploration and accurate target localization, while room nodes support efficient inter-room planning. Building on this representation, we propose a novel two-stage navigation policy, integrating high-level planning guided by VLM reasoning with low-level VLM-assisted exploration to efficiently locate a goal object. We evaluated our approach on three simulated benchmarks (HM3D, RoboTHOR, and MP3D), and achieved state-of-the-art performance on both the success rate ($\mathord{\uparrow}\, 7.1\%$) and navigation efficiency ($\mathord{\uparrow}\, 12.5\%$). We further validate our method on a real robot platform, demonstrating strong robustness across 15 object navigation tasks in 10 different indoor environments. Project page is available at https://zwandering.github.io/STRIVE.github.io/ .
Motion Planning for Autonomous Vehicles: When Model Predictive Control Meets Ensemble Kalman Smoothing
Safe and efficient motion planning is of fundamental importance for autonomous vehicles. This paper investigates motion planning based on nonlinear model predictive control (NMPC) over a neural network vehicle model. We aim to overcome the high computational costs that arise in NMPC of the neural network model due to the highly nonlinear and nonconvex optimization. In a departure from numerical optimization solutions, we reformulate the problem of NMPC-based motion planning as a Bayesian estimation problem, which seeks to infer optimal planning decisions from planning objectives. Then, we use a sequential ensemble Kalman smoother to accomplish the estimation task, exploiting its high computational efficiency for complex nonlinear systems. The simulation results show an improvement in computational speed by orders of magnitude, indicating the potential of the proposed approach for practical motion planning.
3D Characterization of Smoke Plume Dispersion Using Multi-View Drone Swarm
This study presents an advanced multi-view drone swarm imaging system for the three-dimensional characterization of smoke plume dispersion dynamics. The system comprises a manager drone and four worker drones, each equipped with high-resolution cameras and precise GPS modules. The manager drone uses image feedback to autonomously detect and position itself above the plume, then commands the worker drones to orbit the area in a synchronized circular flight pattern, capturing multi-angle images. The camera poses of these images are first estimated, then the images are grouped in batches and processed using Neural Radiance Fields (NeRF) to generate high-resolution 3D reconstructions of plume dynamics over time. Field tests demonstrated the ability of the system to capture critical plume characteristics including volume dynamics, wind-driven directional shifts, and lofting behavior at a temporal resolution of about 1 s. The 3D reconstructions generated by this system provide unique field data for enhancing the predictive models of smoke plume dispersion and fire spread. Broadly, the drone swarm system offers a versatile platform for high resolution measurements of pollutant emissions and transport in wildfires, volcanic eruptions, prescribed burns, and industrial processes, ultimately supporting more effective fire control decisions and mitigating wildfire risks.
comment: 10 pages, 8 figures
ACORN: Adaptive Contrastive Optimization for Safe and Robust Fine-Grained Robotic Manipulation
Embodied AI research has traditionally emphasized performance metrics such as success rate and cumulative reward, overlooking critical robustness and safety considerations that emerge during real-world deployment. In actual environments, agents continuously encounter unpredicted situations and distribution shifts, causing seemingly reliable policies to experience catastrophic failures, particularly in manipulation tasks. To address this gap, we introduce four novel safety-centric metrics that quantify an agent's resilience to environmental perturbations. Building on these metrics, we present Adaptive Contrastive Optimization for Robust Manipulation (ACORN), a plug-and-play algorithm that enhances policy robustness without sacrificing performance. ACORN leverages contrastive learning to simultaneously align trajectories with expert demonstrations while diverging from potentially unsafe behaviors. Our approach efficiently generates informative negative samples through structured Gaussian noise injection, employing a double perturbation technique that maintains sample diversity while minimizing computational overhead. Comprehensive experiments across diverse manipulation environments validate ACORN's effectiveness, yielding improvements of up to 23% in safety metrics under disturbance compared to baseline methods. These findings underscore ACORN's significant potential for enabling reliable deployment of embodied agents in safety-critical real-world applications.
comment: 6 pages,4 figures
Emergent Multi-View Fidelity in Autonomous UAV Swarm Sport Injury Detection
Accurate, real-time collision detection is essential for ensuring player safety and effective refereeing in high-contact sports such as rugby, particularly given the severe risks associated with traumatic brain injuries (TBI). Traditional collision-monitoring methods employing fixed cameras or wearable sensors face limitations in visibility, coverage, and responsiveness. Previously, we introduced a framework using unmanned aerial vehicles (UAVs) for monitoring and real time kinematics extraction from videos of collision events. In this paper, we show that the strategies operating on the objective of ensuring at least one UAV captures every incident on the pitch have an emergent property of fulfilling a stronger key condition for successful kinematics extraction. Namely, they ensure that almost all collisions are captured by multiple drones, establishing multi-view fidelity and redundancy, while not requiring any drone-to-drone communication.
comment: Accepted for 2025 8th International Balkan Conference on Communications and Networking (Balkancom)
JAEGER: Dual-Level Humanoid Whole-Body Controller
This paper presents JAEGER, a dual-level whole-body controller for humanoid robots that addresses the challenges of training a more robust and versatile policy. Unlike traditional single-controller approaches, JAEGER separates the control of the upper and lower bodies into two independent controllers, so that they can better focus on their distinct tasks. This separation alleviates the dimensionality curse and improves fault tolerance. JAEGER supports both root velocity tracking (coarse-grained control) and local joint angle tracking (fine-grained control), enabling versatile and stable movements. To train the controller, we utilize a human motion dataset (AMASS), retargeting human poses to humanoid poses through an efficient retargeting network, and employ a curriculum learning approach. This method performs supervised learning for initialization, followed by reinforcement learning for further exploration. We conduct our experiments on two humanoid platforms and demonstrate the superiority of our approach against state-of-the-art methods in both simulation and real environments.
comment: 15 pages, 2 figures
Quadrupedal Robot Skateboard Mounting via Reverse Curriculum Learning
The aim of this work is to enable quadrupedal robots to mount skateboards using Reverse Curriculum Reinforcement Learning. Although prior work has demonstrated skateboarding for quadrupeds that are already positioned on the board, the initial mounting phase still poses a significant challenge. A goal-oriented methodology was adopted, beginning with the terminal phases of the task and progressively increasing the complexity of the problem definition to approximate the desired objective. The learning process was initiated with the skateboard rigidly fixed within the global coordinate frame and the robot positioned directly above it. Through gradual relaxation of these initial conditions, the learned policy demonstrated robustness to variations in skateboard position and orientation, ultimately exhibiting a successful transfer to scenarios involving a mobile skateboard. The code, trained models, and reproducible examples are available at the following link: https://github.com/dancher00/quadruped-skateboard-mounting
Work in Progress: Middleware-Transparent Callback Enforcement in Commoditized Component-Oriented Real-time Systems
Real-time scheduling in commoditized component-oriented real-time systems, such as ROS 2 systems on Linux, has been studied under nested scheduling: OS thread scheduling and middleware layer scheduling (e.g., ROS 2 Executor). However, by establishing a persistent one-to-one correspondence between callbacks and OS threads, we can ignore the middleware layer and directly apply OS scheduling parameters (e.g., scheduling policy, priority, and affinity) to individual callbacks. We propose a middleware model that enables this idea and implements CallbackIsolatedExecutor as a novel ROS 2 Executor. We demonstrate that the costs (user-kernel switches, context switches, and memory usage) of CallbackIsolatedExecutor remain lower than those of the MultiThreadedExecutor, regardless of the number of callbacks. Additionally, the cost of CallbackIsolatedExecutor relative to SingleThreadedExecutor stays within a fixed ratio (1.4x for inter-process and 5x for intra-process communication). Future ROS 2 real-time scheduling research can avoid nested scheduling, ignoring the existence of the middleware layer.
comment: 4 pages, 5 figures. Accepted for IEEE RTAS 2025; this is the author-accepted manuscript. Final version in IEEE Xplore: https://doi.org/10.1109/RTAS65571.2025.00017
Edge-Enabled VIO with Long-Tracked Features for High-Accuracy Low-Altitude IoT Navigation
This paper presents a visual-inertial odometry (VIO) method using long-tracked features. Long-tracked features can constrain more visual frames, reducing localization drift. However, they may also lead to accumulated matching errors and drift in feature tracking. Current VIO methods adjust observation weights based on re-projection errors, yet this approach has flaws. Re-projection errors depend on estimated camera poses and map points, so increased errors might come from estimation inaccuracies, not actual feature tracking errors. This can mislead the optimization process and make long-tracked features ineffective for suppressing localization drift. Furthermore, long-tracked features constrain a larger number of frames, which poses a significant challenge to real-time performance of the system. To tackle these issues, we propose an active decoupling mechanism for accumulated errors in long-tracked feature utilization. We introduce a visual reference frame reset strategy to eliminate accumulated tracking errors and a depth prediction strategy to leverage the long-term constraint. To ensure real time preformane, we implement three strategies for efficient system state estimation: a parallel elimination strategy based on predefined elimination order, an inverse-depth elimination simplification strategy, and an elimination skipping strategy. Experiments on various datasets show that our method offers higher positioning accuracy with relatively short consumption time, making it more suitable for edge-enabled low-altitude IoT navigation, where high-accuracy positioning and real-time operation on edge device are required. The code will be published at github.
comment: 9 pages with 9 figures
LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus
Large Language Models (LLMs) have advanced rapidly in recent years, demonstrating strong capabilities in problem comprehension and reasoning. Inspired by these developments, researchers have begun exploring the use of LLMs as decentralized decision-makers for multi-robot formation control. However, prior studies reveal that directly applying LLMs to such tasks often leads to unstable and inconsistent behaviors, where robots may collapse to the centroid of their positions or diverge entirely due to hallucinated reasoning, logical inconsistencies, and limited coordination awareness. To overcome these limitations, we propose a novel framework that integrates LLMs with an influence-based plan consensus protocol. In this framework, each robot independently generates a local plan toward the desired formation using its own LLM. The robots then iteratively refine their plans through a decentralized consensus protocol that accounts for their influence on neighboring robots. This process drives the system toward a coherent and stable flocking formation in a fully decentralized manner. We evaluate our approach through comprehensive simulations involving both state-of-the-art closed-source LLMs (e.g., o3-mini, Claude 3.5) and open-source models (e.g., Llama3.1-405b, Qwen-Max, DeepSeek-R1). The results show notable improvements in stability, convergence, and adaptability over previous LLM-based methods. We further validate our framework on a physical team of Crazyflie drones, demonstrating its practical viability and effectiveness in real-world multi-robot systems.
CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground Environments
Robot autonomy in unknown, GPS-denied, and complex underground environments requires real-time, robust, and accurate onboard pose estimation and mapping for reliable operations. This becomes particularly challenging in perception-degraded subterranean conditions under harsh environmental factors, including darkness, dust, and geometrically self-similar structures. This paper details CompSLAM, a highly resilient and hierarchical multi-modal localization and mapping framework designed to address these challenges. Its flexible architecture achieves resilience through redundancy by leveraging the complementary nature of pose estimates derived from diverse sensor modalities. Developed during the DARPA Subterranean Challenge, CompSLAM was successfully deployed on all aerial, legged, and wheeled robots of Team Cerberus during their competition-winning final run. Furthermore, it has proven to be a reliable odometry and mapping solution in various subsequent projects, with extensions enabling multi-robot map sharing for marsupial robotic deployments and collaborative mapping. This paper also introduces a comprehensive dataset acquired by a manually teleoperated quadrupedal robot, covering a significant portion of the DARPA Subterranean Challenge finals course. This dataset evaluates CompSLAM's robustness to sensor degradations as the robot traverses 740 meters in an environment characterized by highly variable geometries and demanding lighting conditions. The CompSLAM code and the DARPA SubT Finals dataset are made publicly available for the benefit of the robotics community
comment: 8 pages, 9 figures, Code: https://github.com/leggedrobotics/compslam_subt
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
Offline reinforcement learning (RL) enables policy optimization in static datasets, avoiding the risks and costs of real-world exploration. However, it struggles with suboptimal behavior learning and inaccurate value estimation due to the lack of environmental interaction. In this paper, we present Video-Enhanced Offline RL (VeoRL), a model-based approach that constructs an interactive world model from diverse, unlabeled video data readily available online. Leveraging model-based behavior guidance, VeoRL transfers commonsense knowledge of control policy and physical dynamics from natural videos to the RL agent within the target domain. Our method achieves substantial performance gains (exceeding 100% in some cases) across visuomotor control tasks in robotic manipulation, autonomous driving, and open-world video games.
Safety-Critical Formation Control of Non-Holonomic Multi-Robot Systems in Communication-Limited Environments
This paper introduces a decentralized estimator-based safety-critical controller designed for formation control of non-holonomic mobile robots operating in communication-constrained environments. The proposed framework integrates a robust state estimator capable of accurately reconstructing neighboring agents' velocity vectors and orientations under varying dynamic conditions, with a decentralized formation tracking controller that leverages Control Barrier Functions (CBFs) to guarantee collision avoidance and inter-agent safety. We present a closed-form control law that ensures both stability and string stability, effectively attenuating disturbances propagating from leader to followers. The theoretical foundations of the estimator and controller are established using Lyapunov stability analysis, which confirms global asymptotic stability under constant velocities and global uniformly ultimate boundedness under time-varying conditions. Extensive numerical simulations and realistic Gazebo-based experiments validate the effectiveness, robustness, and practical applicability of the proposed method, demonstrating precise formation tracking, stringent safety maintenance, and disturbance resilience without relying on inter-robot communication.
comment: Under review. Video demonstration: https://vimeo.com/1075016147/41612f2f8c
Perceptive Mixed-Integer Footstep Control for Underactuated Bipedal Walking on Rough Terrain
Traversing rough terrain requires dynamic bipeds to stabilize themselves through foot placement without stepping in unsafe areas. Planning these footsteps online is challenging given non-convexity of the safe terrain, and imperfect perception and state estimation. This paper addresses these challenges with a full-stack perception and control system for achieving underactuated walking on discontinuous terrain. First, we develop model-predictive footstep control (MPFC), a single mixed-integer quadratic program which assumes a convex polygon terrain decomposition to optimize over discrete foothold choice, footstep position, ankle torque, template dynamics, and footstep timing at over 100 Hz. We then propose a novel approach for generating convex polygon terrain decompositions online. Our perception stack decouples safe-terrain classification from fitting planar polygons, generating a temporally consistent terrain segmentation in real time using a single CPU thread. We demonstrate the performance of our perception and control stack through outdoor experiments with the underactuated biped Cassie, achieving state of the art perceptive bipedal walking on discontinuous terrain. Supplemental Video: https://youtu.be/JK16KJXJxi4
Estimation-Aware Trajectory Optimization with Set-Valued Measurement Uncertainties
In this paper, an optimization-based framework for generating estimation-aware trajectories is presented. In this setup, measurement (output) uncertainties are state-dependent and set-valued. Enveloping ellipsoids are employed to characterize state-dependent uncertainties with unknown distributions. The concept of regularity for set-valued output maps is then introduced, facilitating the formulation of the estimation-aware trajectory generation problem. Specifically, it is demonstrated that for output-regular maps, one can utilize a set-valued observability measure that is concave with respect to the finite horizon state trajectories. By maximizing this measure, estimation-aware trajectories can then be synthesized for a broad class of systems. Trajectory planning routines are also examined in this work, by which the observability measure is optimized for systems with locally linearized dynamics. To illustrate the effectiveness of the proposed approach, representative examples in the context of trajectory planning with vision-based estimation are presented. Moreover, the paper presents estimation-aware planning for an uncooperative Target-Rendezvous problem, where an Ego-satellite employs an onboard machine learning (ML)-based estimation module to realize the rendezvous trajectory.
comment: 40 pages, 9 figures
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One fundamental challenge is the lack of robotic data, which are typically obtained through expensive on-robot operation. A promising remedy is to leverage cheaper, off-domain data such as action-free videos, hand-drawn sketches or simulation data. In this work, we posit that hierarchical vision-language-action (VLA) models can be more effective in utilizing off-domain data than standard monolithic VLA models that directly finetune vision-language models (VLMs) to predict actions. In particular, we study a class of hierarchical VLA models, where the high-level VLM is finetuned to produce a coarse 2D path indicating the desired robot end-effector trajectory given an RGB image and a task description. The intermediate 2D path prediction is then served as guidance to the low-level, 3D-aware control policy capable of precise manipulation. Doing so alleviates the high-level VLM from fine-grained action prediction, while reducing the low-level policy's burden on complex task-level reasoning. We show that, with the hierarchical design, the high-level VLM can transfer across significant domain gaps between the off-domain finetuning data and real-robot testing scenarios, including differences on embodiments, dynamics, visual appearances and task semantics, etc. In the real-robot experiments, we observe an average of 20% improvement in success rate across seven different axes of generalization over OpenVLA, representing a 50% relative gain. Visual results, code, and dataset are provided at: https://hamster-robot.github.io/
comment: update related work and results on VQA benchmarks
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Teaching robots desired skills in real-world environments remains challenging, especially for non-experts. A key bottleneck is that collecting robotic data often requires expertise or specialized hardware, limiting accessibility and scalability. We posit that natural language offers an intuitive and accessible interface for robot learning. To this end, we study two aspects: (1) enabling non-experts to collect robotic data through natural language supervision (e.g., "move the arm to the right") and (2) training robot policies directly from this supervision. Specifically, we introduce a data collection framework that collects robot demonstrations based on natural language supervision and further augments these demonstrations. We then present CLIP-RT, a new vision-language-action (VLA) model that learns language-conditioned visuomotor policies from this supervision. CLIP-RT adapts the pretrained CLIP model and learns to predict language-based motion primitives via contrastive imitation learning. We train CLIP-RT on the Open X-Embodiment dataset and finetune it on in-domain data collected by our framework. In real-world evaluations, CLIP-RT demonstrates strong capabilities in learning novel manipulation skills, outperforming OpenVLA (7B parameters) by 24% in average success rates, while using 7x fewer parameters (1B). We further assess CLIP-RT's capabilities in few-shot generalization and collaborative scenarios involving large pretrained models or humans. In simulated environments, CLIP-RT also yields strong performance, achieving a 93.1% average success rate on the LIBERO benchmark with an inference throughput of 163 Hz.
comment: Accepted to RSS 2025. Project website: https://clip-rt.github.io
Barrier-Enhanced Parallel Homotopic Trajectory Optimization for Safety-Critical Autonomous Driving
Enforcing safety while preventing overly conservative behaviors is essential for autonomous vehicles to achieve high task performance. In this paper, we propose a barrier-enhanced parallel homotopic trajectory optimization (BPHTO) approach with the over-relaxed alternating direction method of multipliers (ADMM) for real-time integrated decision-making and planning. To facilitate safety interactions between the ego vehicle (EV) and surrounding vehicles, a spatiotemporal safety module exhibiting bi-convexity is developed on the basis of barrier function. Varying barrier coefficients are adopted for different time steps in a planning horizon to account for the motion uncertainties of surrounding HVs and mitigate conservative behaviors. Additionally, we exploit the discrete characteristics of driving maneuvers to initialize nominal behavior-oriented free-end homotopic trajectories based on reachability analysis, and each trajectory is locally constrained to a specific driving maneuver while sharing the same task objectives. By leveraging the bi-convexity of the safety module and the kinematics of the EV, we formulate the BPHTO as a bi-convex optimization problem. Then constraint transcription and the over-relaxed ADMM are employed to streamline the optimization process, such that multiple trajectories are generated in real time with feasibility guarantees. Through a series of experiments, the proposed development demonstrates improved task accuracy, stability, and consistency in various traffic scenarios using synthetic and real-world traffic datasets.
comment: 17 pages, 10 figures
Automotive Speed Estimation: Sensor Types and Error Characteristics from OBD-II to ADAS
Modern on-road navigation systems heavily depend on integrating speed measurements with inertial navigation systems (INS) and global navigation satellite systems (GNSS). Telemetry-based applications typically source speed data from the On-Board Diagnostic II (OBD-II) system. However, the method of deriving speed, as well as the types of sensors used to measure wheel speed, differs across vehicles. These differences result in varying error characteristics that must be accounted for in navigation and autonomy applications. This paper addresses this gap by examining the diverse speed-sensing technologies employed in standard automotive systems and alternative techniques used in advanced systems designed for higher levels of autonomy, such as Advanced Driver Assistance Systems (ADAS), Autonomous Driving (AD), or surveying applications. We propose a method to identify the type of speed sensor in a vehicle and present strategies for accurately modeling its error characteristics. To validate our approach, we collected and analyzed data from three long real road trajectories conducted in urban environments in Toronto and Kingston, Ontario, Canada. The results underscore the critical role of integrating multiple sensor modalities to achieve more accurate speed estimation, thus improving automotive navigation state estimation, particularly in GNSS-denied environments.
comment: 7 pages, 12 figures, to be published in IEEE/ION PLANS 2025
Multiagent Systems
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes
Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot reinforcement learning (MRRL) policies with realistic robot dynamics and safety constraints, supporting both parallelization and hardware acceleration. Our generalizable learning interface provides an easy-to-use integration with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a realistic robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation.
comment: 22 pages, 14 figures, 10 tables
Learning Graph Representation of Agent Diffuser AAMAS2025
Diffusion-based generative models have significantly advanced text-to-image synthesis, demonstrating impressive text comprehension and zero-shot generalization. These models refine images from random noise based on textual prompts, with initial reliance on text input shifting towards enhanced visual fidelity over time. This transition suggests that static model parameters might not optimally address the distinct phases of generation. We introduce LGR-AD (Learning Graph Representation of Agent Diffusers), a novel multi-agent system designed to improve adaptability in dynamic computer vision tasks. LGR-AD models the generation process as a distributed system of interacting agents, each representing an expert sub-model. These agents dynamically adapt to varying conditions and collaborate through a graph neural network that encodes their relationships and performance metrics. Our approach employs a coordination mechanism based on top-$k$ maximum spanning trees, optimizing the generation process. Each agent's decision-making is guided by a meta-model that minimizes a novel loss function, balancing accuracy and diversity. Theoretical analysis and extensive empirical evaluations show that LGR-AD outperforms traditional diffusion models across various benchmarks, highlighting its potential for scalable and flexible solutions in complex image generation tasks. Code is available at: https://github.com/YousIA/LGR_AD
comment: Accepted at AAMAS2025 International Conference on Autonomous Agents and Multiagent Systems
Emergent Multi-View Fidelity in Autonomous UAV Swarm Sport Injury Detection
Accurate, real-time collision detection is essential for ensuring player safety and effective refereeing in high-contact sports such as rugby, particularly given the severe risks associated with traumatic brain injuries (TBI). Traditional collision-monitoring methods employing fixed cameras or wearable sensors face limitations in visibility, coverage, and responsiveness. Previously, we introduced a framework using unmanned aerial vehicles (UAVs) for monitoring and real time kinematics extraction from videos of collision events. In this paper, we show that the strategies operating on the objective of ensuring at least one UAV captures every incident on the pitch have an emergent property of fulfilling a stronger key condition for successful kinematics extraction. Namely, they ensure that almost all collisions are captured by multiple drones, establishing multi-view fidelity and redundancy, while not requiring any drone-to-drone communication.
comment: Accepted for 2025 8th International Balkan Conference on Communications and Networking (Balkancom)
The evolutionary advantage of guilt: co-evolution of social and non-social guilt in structured populations
Building ethical machines may involve bestowing upon them the emotional capacity to self-evaluate and repent on their actions. While apologies represent potential strategic interactions, the explicit evolution of guilt as a behavioural trait remains poorly understood. Our study delves into the co-evolution of two forms of emotional guilt: social guilt entails a cost, requiring agents to exert efforts to understand others' internal states and behaviours; and non-social guilt, which only involves awareness of one's own state, incurs no social cost. Resorting to methods from evolutionary game theory, we study analytically, and through extensive numerical and agent-based simulations, whether and how guilt can evolve and deploy, depending on the underlying structure of the systems of agents. Our findings reveal that in lattice and scale-free networks, strategies favouring emotional guilt dominate a broader range of guilt and social costs compared to non-structured well-mixed populations, so leading to higher levels of cooperation. In structured populations, both social and non-social guilt can thrive through clustering with emotionally inclined strategies, thereby providing protection against exploiters, particularly for less costly non-social strategies. These insights shed light on the complex interplay of guilt and cooperation, enhancing our understanding of ethical artificial intelligence.
comment: 26 pages, 6 figures, in press with the Journal of the Royal Society Interface
Systems and Control (CS)
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and often researchers lack availability to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving to semi-fullscale validation with the R/V milliAmpere 1 passenger ferry and full-scale validation using the R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC) including autonomy.
Control Barrier Functions With Real-Time Gaussian Process Modeling
We present an approach for satisfying state constraints in systems with nonparametric uncertainty by estimating this uncertainty with a real-time-update Gaussian process (GP) model. Notably, new data is incorporated into the model in real time as it is obtained and select old data is removed from the model. This update process helps improve the model estimate while keeping the model size (memory required) and computational complexity fixed. We present a recursive formulation for the model update, which reduces time complexity of the update from O(p3) to O(p2), where p is the number of data used. The GP model includes a computable upper bound on the model error. Together, the model and upper bound are used to construct a control-barrier-function (CBF) constraint that guarantees state constraints are satisfied.
comment: 8 pages , 10 figures , conference version
Improving 5G/B5G Network Performance with RFID-Enabled Resource Management Systems
In the rapidly evolving landscape of 5G and B5G (beyond 5G) networks, efficient resource optimization is critical to addressing the escalating demands for high-speed, low-latency, and energy efficient communication. This study explores the integration of Radio Frequency Identification (RFID) technology as a novel approach to enhance resource management in 5G/B5G networks. The motivation behind this research lies in overcoming persistent challenges such as spectrum congestion, high latency, and inefficient load balancing, which impede the performance of traditional resource allocation methods. To achieve this, RFID tags were embedded in critical network components, including user devices, base stations, and Internet of Things (IoT) nodes, enabling the collection of real-time data on device status, location, and resource utilization. RFID readers strategically placed across the network continuously captured this data, which was processed by a centralized controller using a custom-designed optimization algorithm. This algorithm dynamically managed key network resources, including spectrum allocation, load balancing, and energy consumption, ensuring efficient operation under varying network conditions. Simulations were conducted to evaluate the performance of the RFID-based model against traditional 4G dynamic resource allocation techniques. The results demonstrated substantial improvements in key performance metrics.
comment: 13 pages, 7 figures, 5 tables
AI-CDA4All: Democratizing Cooperative Autonomous Driving for All Drivers via Affordable Dash-cam Hardware and Open-source AI Software
As transportation technology advances, the demand for connected vehicle infrastructure has greatly increased to improve their efficiency and safety. One area of advancement, Cooperative Driving Automation (CDA) still relies on expensive autonomy sensors or connectivity units and are not interoperable across existing market car makes/models, limiting its scalability on public roads. To fill these gaps, this paper presents a novel approach to democratizing CDA technology, it leverages low-cost, commercially available edge devices such as vehicle dash-cams and open-source software to make the technology accessible and scalable to be used in transportation infrastructure and broader public domains. This study also investigates the feasibility of utilizing cost-effective communication protocols based on LTE and WiFi. These technologies enable lightweight Vehicle-to-Everything (V2X) communications, facilitating real-time data exchange between vehicles and infrastructure. Our research and development efforts are aligned with industrial standards to ensure compatibility and future integration into existing transportation ecosystems. By prioritizing infrastructure-oriented applications, such as improved traffic flow management, this approach seeks to deliver tangible societal benefits without directly competing with vehicle OEMs. As recent advancement of Generative AI (GenAI), there is no standardized integration of GenAI technologies into open-source CDAs, as the current trends of muiltimodal large language models gain popularity, we demonstrated a feasible locally deployed edge LLM models can enhance driving experience while preserving privacy and security compared to cloud-connected solutions. The proposed system underscores the potential of low-cost, scalable solutions in advancing CDA functionality, paving the way for smarter, safer, and more inclusive transportation networks.
comment: 8 pages, 10 figures
Distributed Event-Triggered Nash Equilibrium Seeking for Noncooperative Games
We propose locally convergent Nash equilibrium seeking algorithms for $N$-player noncooperative games, which use distributed event-triggered pseudo-gradient estimates. The proposed approach employs sinusoidal perturbations to estimate the pseudo-gradients of unknown quadratic payoff functions. This is the first instance of noncooperative games being tackled in a model-free fashion with event-triggered extremum seeking. Each player evaluates independently the deviation between the corresponding current pseudo-gradient estimate and its last broadcasted value from the event-triggering mechanism to tune individually the player action, while they preserve collectively the closed-loop stability/convergence. We guarantee Zeno behavior avoidance by establishing a minimum dwell-time to avoid infinitely fast switching. In particular, the stability analysis is carried out using Lyapunov's method and averaging for systems with discontinuous right-hand sides. We quantify the size of the ultimate small residual sets around the Nash equilibrium and illustrate the theoretical results numerically on an oligopoly setting.
A Survey on Data-Driven Modeling of Human Drivers' Lane-Changing Decisions
Lane-changing (LC) behavior, a critical yet complex driving maneuver, significantly influences driving safety and traffic dynamics. Traditional analytical LC decision (LCD) models, while effective in specific environments, often oversimplify behavioral heterogeneity and complex interactions, limiting their capacity to capture real LCD. Data-driven approaches address these gaps by leveraging rich empirical data and machine learning to decode latent decision-making patterns, enabling adaptive LCD modeling in dynamic environments. In light of the rapid development of artificial intelligence and the demand for data-driven models oriented towards connected vehicles and autonomous vehicles, this paper presents a comprehensive survey of data-driven LCD models, with a particular focus on human drivers LC decision-making. It systematically reviews the modeling framework, covering data sources and preprocessing, model inputs and outputs, objectives, structures, and validation methods. This survey further discusses the opportunities and challenges faced by data-driven LCD models, including driving safety, uncertainty, as well as the integration and improvement of technical frameworks.
Dynamic feedback linearization of two-input control systems via successive one-fold prolongations
In this paper, we propose a constructive algorithm to dynamically linearize two-input control systems via successive one-fold prolongations of a control that has to be suitably chosen at each step of the algorithm. Linearization via successive one-fold prolongations requires special properties of the linearizability distributions $\mathcal{D}^0 \subset \mathcal{D}^1 \subset\mathcal{D}^2 \subset \cdots$. Contrary to the case of static feedback linearizability, they need not be involutive but the first noninvolutive one has to contain an involutive subdistribution of corank one. The main idea of the proposed algorithm is to replace, at each step, the first noninvolutive distribution by its involutive subdistribution of corank one, thus for the prolonged system we gain at least one new involutive distribution. Our algorithm is constructive, gives sufficient conditions for flatness, and can be seen as the dual of the dynamic feedback linearization algorithm of Battilotti and Califano [2003, 2005].
A Novel Inverter Control Strategy with Power Decoupling for Microgrid Operations in Grid-Connected and Islanded Modes
Grid-forming, particularly those utilizing droop control and virtual synchronous generators (VSG), can actively regulate the frequency and voltage of microgrid systems, exhibiting dynamic characteristics akin to those of synchronous generators. Although droop control and VSG control each have distinct benefits, neither can fully meet the diverse, dynamic needs of both grid-connected (GC) and islanded (IS) modes. Additionally, the coupling between active and reactive power can negatively impact microgrids' dynamic performance and stability. To solve these problems, this paper introduces a unified dynamic power coupling (UDC) model. This model's active power control loop can be tailored to meet diverse requirements. By implementing a well-designed control loop, the system can harness the advantages of both droop control and VSG control. In islanded mode, the proposed model can provide virtual inertia and damping properties, while in grid-connected mode, the inverter's active power output can follow the changed references without significant overshoot or oscillation. Furthermore, the model incorporates coupling compensation and virtual impedance based on a relative gain array in the frequency domain to facilitate quantitative analysis of power coupling characteristics. This paper outlines a distinct design process for the unified model. Finally, the proposed control method has been validated through simulation.
Mixer-Informer-Based Two-Stage Transfer Learning for Long-Sequence Load Forecasting in Newly Constructed Electric Vehicle Charging Stations
The rapid rise in electric vehicle (EV) adoption demands precise charging station load forecasting, challenged by long-sequence temporal dependencies and limited data in new facilities. This study proposes MIK-TST, a novel two-stage transfer learning framework integrating Mixer, Informer, and Kolmogorov-Arnold Networks (KAN). The Mixer fuses multi-source features, Informer captures long-range dependencies via ProbSparse attention, and KAN enhances nonlinear modeling with learnable activation functions. Pre-trained on extensive data and fine-tuned on limited target data, MIK-TST achieves 4% and 8% reductions in MAE and MSE, respectively, outperforming baselines on a dataset of 26 charging stations in Boulder, USA. This scalable solution enhances smart grid efficiency and supports sustainable EV infrastructure expansion.
comment: 10 Pages
Emergent Multi-View Fidelity in Autonomous UAV Swarm Sport Injury Detection
Accurate, real-time collision detection is essential for ensuring player safety and effective refereeing in high-contact sports such as rugby, particularly given the severe risks associated with traumatic brain injuries (TBI). Traditional collision-monitoring methods employing fixed cameras or wearable sensors face limitations in visibility, coverage, and responsiveness. Previously, we introduced a framework using unmanned aerial vehicles (UAVs) for monitoring and real time kinematics extraction from videos of collision events. In this paper, we show that the strategies operating on the objective of ensuring at least one UAV captures every incident on the pitch have an emergent property of fulfilling a stronger key condition for successful kinematics extraction. Namely, they ensure that almost all collisions are captured by multiple drones, establishing multi-view fidelity and redundancy, while not requiring any drone-to-drone communication.
comment: Accepted for 2025 8th International Balkan Conference on Communications and Networking (Balkancom)
On the Existence of Lagrange Multipliers in Distribution Network Reconfiguration Problems
Distribution network reconfiguration (DNR) is an effective approach for optimizing distribution network operation. However, the DNR problem is computationally challenging due to the mixed-integer non-convex nature. One feasible approach for addressing this challenge is to reformulate binary line on/off state variables as (continuous) non-convex constraints, leading to a nonconvex program. Unfortunately, it remains unclear whether this formulation satisfies the Karush-Kuhn-Tucker (KKT) conditions at locally optimal solutions. In this brief, we study the existence of Lagrange multipliers in DNR problems and prove that under mild assumptions, Lagrange multipliers exist for the DNR model at every locally optimal solution almost surely such that the KKT conditions hold.
comment: 8 pages
Joint Parameterization of Hybrid Physics-Based and Machine Learning Li-Ion Battery Model
Electrochemical hybrid battery models have major potential to enable advanced physics-based control, diagnostic, and prognostic features for next-generation lithium-ion battery management systems. This is due to the physical significance of the electrochemical model, which is complemented by a machine learning model that compensates for output prediction errors caused by system uncertainties. While hybrid models have demonstrated robust output prediction performance under large system uncertainties, they are highly susceptible to the influence of uncertainties during parameter identification, which can compromise the physical significance of the model. To address this challenge, we present a parameter estimation framework that explicitly considers system uncertainties through a discrepancy function. The approach also incorporates a downsampling procedure to address the computational barriers associated with large time-series data sets, as are typical in the battery domain. The framework was validated in simulation, yielding several mean parameter estimation errors that were one order of magnitude smaller than those of the conventional least squares approach. While developed for the high-uncertainty, electrochemical hybrid modeling context, the estimation framework is applicable to all models and is presented in a generalized form.
comment: 7 pages, 2 figures, to be published in the 2025 American Control Conference proceedings
Safety-Critical Formation Control of Non-Holonomic Multi-Robot Systems in Communication-Limited Environments
This paper introduces a decentralized estimator-based safety-critical controller designed for formation control of non-holonomic mobile robots operating in communication-constrained environments. The proposed framework integrates a robust state estimator capable of accurately reconstructing neighboring agents' velocity vectors and orientations under varying dynamic conditions, with a decentralized formation tracking controller that leverages Control Barrier Functions (CBFs) to guarantee collision avoidance and inter-agent safety. We present a closed-form control law that ensures both stability and string stability, effectively attenuating disturbances propagating from leader to followers. The theoretical foundations of the estimator and controller are established using Lyapunov stability analysis, which confirms global asymptotic stability under constant velocities and global uniformly ultimate boundedness under time-varying conditions. Extensive numerical simulations and realistic Gazebo-based experiments validate the effectiveness, robustness, and practical applicability of the proposed method, demonstrating precise formation tracking, stringent safety maintenance, and disturbance resilience without relying on inter-robot communication.
comment: Under review. Video demonstration: https://vimeo.com/1075016147/41612f2f8c
Application of a Novel Model Reduction Technique to the Assessment of Boundedness/Stability of Some Delay Time-Varying Vector Nonlinear Systems
Assessing the boundedness and stability of vector nonlinear systems with variable delays and coefficients remains a challenging problem with broad applications in science and engineering. Existing methods tend to produce overly conservative criteria that offer limited practical value and often fail to explicitly characterize the temporal evolution of solution norms. This paper presents a novel framework for evaluating the evolution of solution norms in such systems. This approach constructs scalar counterparts for the original vector equations. We prove that the solutions to these scalar nonlinear equations, which also include delays and variable coefficients, provide upper bounds for the norms of the original solutions, if the history functions for both equations are properly matched. This reduction enables the evaluation of the boundedness and stability of vector systems through the analysis of the dynamics of their scalar counterparts, which can be performed via straightforward simulations or simplified analytical reasoning. Consequently, we introduce new criteria for boundedness and stability and estimate the radii of the balls containing history functions that yield bounded or stable solutions for the original vector systems. Finally, we validated our inferences through representative simulations that also assessed the accuracy of the proposed approach.
comment: 18 pages
Estimation-Aware Trajectory Optimization with Set-Valued Measurement Uncertainties
In this paper, an optimization-based framework for generating estimation-aware trajectories is presented. In this setup, measurement (output) uncertainties are state-dependent and set-valued. Enveloping ellipsoids are employed to characterize state-dependent uncertainties with unknown distributions. The concept of regularity for set-valued output maps is then introduced, facilitating the formulation of the estimation-aware trajectory generation problem. Specifically, it is demonstrated that for output-regular maps, one can utilize a set-valued observability measure that is concave with respect to the finite horizon state trajectories. By maximizing this measure, estimation-aware trajectories can then be synthesized for a broad class of systems. Trajectory planning routines are also examined in this work, by which the observability measure is optimized for systems with locally linearized dynamics. To illustrate the effectiveness of the proposed approach, representative examples in the context of trajectory planning with vision-based estimation are presented. Moreover, the paper presents estimation-aware planning for an uncooperative Target-Rendezvous problem, where an Ego-satellite employs an onboard machine learning (ML)-based estimation module to realize the rendezvous trajectory.
comment: 40 pages, 9 figures
Data-Driven Yet Formal Policy Synthesis for Stochastic Nonlinear Dynamical Systems
The automated synthesis of control policies for stochastic dynamical systems presents significant challenges. A standard approach is to construct a finite-state abstraction of the continuous system, typically represented as a Markov decision process (MDP). However, generating abstractions is challenging when (1) the system's dynamics are nonlinear, and/or (2) we do not have complete knowledge of the dynamics. In this work, we introduce a novel data-driven abstraction technique for nonlinear Lipschitz continuous dynamical systems with additive stochastic noise that addresses both of these issues. As a key step, we use samples of the dynamics to learn the enabled actions and transition probabilities of the abstraction. We represent abstractions as MDPs with intervals of transition probabilities, known as interval MDPs (IMDPs). These abstractions enable the synthesis of policies for the concrete nonlinear system, with probably approximately correct (PAC) guarantees on the probability of satisfying a specified control objective. Our numerical experiments illustrate the effectiveness and robustness of our approach in achieving reliable control under uncertainty.
A new framework for constrained optimization via feedback control of Lagrange multipliers
The continuous-time analysis of existing iterative algorithms for optimization has a long history. This work proposes a novel continuous-time control-theoretic framework for equality-constrained optimization. The key idea is to design a feedback control system where the Lagrange multipliers are the control input, and the output represents the constraints. The system converges to a stationary point of the constrained optimization problem through suitable regulation. Regarding the Lagrange multipliers, we consider two control laws: proportional-integral control and feedback linearization. These choices give rise to a family of different methods. We rigorously develop the related algorithms, theoretically analyze their convergence and present several numerical experiments to support their effectiveness concerning the state-of-the-art approaches.
comment: submitted for possible publication to IEEE Transactions on Automatic Control
Controlling Peak Sharpness in Multimodal Biomolecular Systems via the Chemical Fokker-Planck Equation
Intracellular biomolecular systems exhibit intrinsic stochasticity due to low molecular copy numbers, leading to multimodal probability distributions that play a crucial role in probabilistic differentiation and cellular decision-making. Controlling the dispersion of multimodal probability distributions in biomolecular systems is critical for regulating stochastic behavior, robustness, and adaptability. However, modifying system parameters to adjust dispersion often affects peak positions, potentially altering a desired phenotype or even fundamental behavior in a genetic pathway. In this paper, we establish a theoretical framework that enables independent control of dispersion while preserving peak positions and modality using the Chemical Fokker-Planck Equation (CFPE) and sharpness, a measure of probability concentration around individual peaks. By analyzing the steady-state solution of the CFPE, we derive explicit conditions under which peak sharpness can be tuned monotonically without changing peak positions or modality. We validate our approach through Monte Carlo simulations on a bimodal chemical system, demonstrating effective dispersion control while maintaining structural stability. This framework provides a systematic approach for designing biomolecular systems with tunable stochastic properties, contributing to advancements in synthetic biology and probabilistic cellular regulation.
Systems and Control (EESS)
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and often researchers lack availability to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving to semi-fullscale validation with the R/V milliAmpere 1 passenger ferry and full-scale validation using the R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC) including autonomy.
Control Barrier Functions With Real-Time Gaussian Process Modeling
We present an approach for satisfying state constraints in systems with nonparametric uncertainty by estimating this uncertainty with a real-time-update Gaussian process (GP) model. Notably, new data is incorporated into the model in real time as it is obtained and select old data is removed from the model. This update process helps improve the model estimate while keeping the model size (memory required) and computational complexity fixed. We present a recursive formulation for the model update, which reduces time complexity of the update from O(p3) to O(p2), where p is the number of data used. The GP model includes a computable upper bound on the model error. Together, the model and upper bound are used to construct a control-barrier-function (CBF) constraint that guarantees state constraints are satisfied.
comment: 8 pages , 10 figures , conference version
Improving 5G/B5G Network Performance with RFID-Enabled Resource Management Systems
In the rapidly evolving landscape of 5G and B5G (beyond 5G) networks, efficient resource optimization is critical to addressing the escalating demands for high-speed, low-latency, and energy efficient communication. This study explores the integration of Radio Frequency Identification (RFID) technology as a novel approach to enhance resource management in 5G/B5G networks. The motivation behind this research lies in overcoming persistent challenges such as spectrum congestion, high latency, and inefficient load balancing, which impede the performance of traditional resource allocation methods. To achieve this, RFID tags were embedded in critical network components, including user devices, base stations, and Internet of Things (IoT) nodes, enabling the collection of real-time data on device status, location, and resource utilization. RFID readers strategically placed across the network continuously captured this data, which was processed by a centralized controller using a custom-designed optimization algorithm. This algorithm dynamically managed key network resources, including spectrum allocation, load balancing, and energy consumption, ensuring efficient operation under varying network conditions. Simulations were conducted to evaluate the performance of the RFID-based model against traditional 4G dynamic resource allocation techniques. The results demonstrated substantial improvements in key performance metrics.
comment: 13 pages, 7 figures, 5 tables
AI-CDA4All: Democratizing Cooperative Autonomous Driving for All Drivers via Affordable Dash-cam Hardware and Open-source AI Software
As transportation technology advances, the demand for connected vehicle infrastructure has greatly increased to improve their efficiency and safety. One area of advancement, Cooperative Driving Automation (CDA) still relies on expensive autonomy sensors or connectivity units and are not interoperable across existing market car makes/models, limiting its scalability on public roads. To fill these gaps, this paper presents a novel approach to democratizing CDA technology, it leverages low-cost, commercially available edge devices such as vehicle dash-cams and open-source software to make the technology accessible and scalable to be used in transportation infrastructure and broader public domains. This study also investigates the feasibility of utilizing cost-effective communication protocols based on LTE and WiFi. These technologies enable lightweight Vehicle-to-Everything (V2X) communications, facilitating real-time data exchange between vehicles and infrastructure. Our research and development efforts are aligned with industrial standards to ensure compatibility and future integration into existing transportation ecosystems. By prioritizing infrastructure-oriented applications, such as improved traffic flow management, this approach seeks to deliver tangible societal benefits without directly competing with vehicle OEMs. As recent advancement of Generative AI (GenAI), there is no standardized integration of GenAI technologies into open-source CDAs, as the current trends of muiltimodal large language models gain popularity, we demonstrated a feasible locally deployed edge LLM models can enhance driving experience while preserving privacy and security compared to cloud-connected solutions. The proposed system underscores the potential of low-cost, scalable solutions in advancing CDA functionality, paving the way for smarter, safer, and more inclusive transportation networks.
comment: 8 pages, 10 figures
Distributed Event-Triggered Nash Equilibrium Seeking for Noncooperative Games
We propose locally convergent Nash equilibrium seeking algorithms for $N$-player noncooperative games, which use distributed event-triggered pseudo-gradient estimates. The proposed approach employs sinusoidal perturbations to estimate the pseudo-gradients of unknown quadratic payoff functions. This is the first instance of noncooperative games being tackled in a model-free fashion with event-triggered extremum seeking. Each player evaluates independently the deviation between the corresponding current pseudo-gradient estimate and its last broadcasted value from the event-triggering mechanism to tune individually the player action, while they preserve collectively the closed-loop stability/convergence. We guarantee Zeno behavior avoidance by establishing a minimum dwell-time to avoid infinitely fast switching. In particular, the stability analysis is carried out using Lyapunov's method and averaging for systems with discontinuous right-hand sides. We quantify the size of the ultimate small residual sets around the Nash equilibrium and illustrate the theoretical results numerically on an oligopoly setting.
A Survey on Data-Driven Modeling of Human Drivers' Lane-Changing Decisions
Lane-changing (LC) behavior, a critical yet complex driving maneuver, significantly influences driving safety and traffic dynamics. Traditional analytical LC decision (LCD) models, while effective in specific environments, often oversimplify behavioral heterogeneity and complex interactions, limiting their capacity to capture real LCD. Data-driven approaches address these gaps by leveraging rich empirical data and machine learning to decode latent decision-making patterns, enabling adaptive LCD modeling in dynamic environments. In light of the rapid development of artificial intelligence and the demand for data-driven models oriented towards connected vehicles and autonomous vehicles, this paper presents a comprehensive survey of data-driven LCD models, with a particular focus on human drivers LC decision-making. It systematically reviews the modeling framework, covering data sources and preprocessing, model inputs and outputs, objectives, structures, and validation methods. This survey further discusses the opportunities and challenges faced by data-driven LCD models, including driving safety, uncertainty, as well as the integration and improvement of technical frameworks.
Dynamic feedback linearization of two-input control systems via successive one-fold prolongations
In this paper, we propose a constructive algorithm to dynamically linearize two-input control systems via successive one-fold prolongations of a control that has to be suitably chosen at each step of the algorithm. Linearization via successive one-fold prolongations requires special properties of the linearizability distributions $\mathcal{D}^0 \subset \mathcal{D}^1 \subset\mathcal{D}^2 \subset \cdots$. Contrary to the case of static feedback linearizability, they need not be involutive but the first noninvolutive one has to contain an involutive subdistribution of corank one. The main idea of the proposed algorithm is to replace, at each step, the first noninvolutive distribution by its involutive subdistribution of corank one, thus for the prolonged system we gain at least one new involutive distribution. Our algorithm is constructive, gives sufficient conditions for flatness, and can be seen as the dual of the dynamic feedback linearization algorithm of Battilotti and Califano [2003, 2005].
A Novel Inverter Control Strategy with Power Decoupling for Microgrid Operations in Grid-Connected and Islanded Modes
Grid-forming, particularly those utilizing droop control and virtual synchronous generators (VSG), can actively regulate the frequency and voltage of microgrid systems, exhibiting dynamic characteristics akin to those of synchronous generators. Although droop control and VSG control each have distinct benefits, neither can fully meet the diverse, dynamic needs of both grid-connected (GC) and islanded (IS) modes. Additionally, the coupling between active and reactive power can negatively impact microgrids' dynamic performance and stability. To solve these problems, this paper introduces a unified dynamic power coupling (UDC) model. This model's active power control loop can be tailored to meet diverse requirements. By implementing a well-designed control loop, the system can harness the advantages of both droop control and VSG control. In islanded mode, the proposed model can provide virtual inertia and damping properties, while in grid-connected mode, the inverter's active power output can follow the changed references without significant overshoot or oscillation. Furthermore, the model incorporates coupling compensation and virtual impedance based on a relative gain array in the frequency domain to facilitate quantitative analysis of power coupling characteristics. This paper outlines a distinct design process for the unified model. Finally, the proposed control method has been validated through simulation.
Mixer-Informer-Based Two-Stage Transfer Learning for Long-Sequence Load Forecasting in Newly Constructed Electric Vehicle Charging Stations
The rapid rise in electric vehicle (EV) adoption demands precise charging station load forecasting, challenged by long-sequence temporal dependencies and limited data in new facilities. This study proposes MIK-TST, a novel two-stage transfer learning framework integrating Mixer, Informer, and Kolmogorov-Arnold Networks (KAN). The Mixer fuses multi-source features, Informer captures long-range dependencies via ProbSparse attention, and KAN enhances nonlinear modeling with learnable activation functions. Pre-trained on extensive data and fine-tuned on limited target data, MIK-TST achieves 4% and 8% reductions in MAE and MSE, respectively, outperforming baselines on a dataset of 26 charging stations in Boulder, USA. This scalable solution enhances smart grid efficiency and supports sustainable EV infrastructure expansion.
comment: 10 Pages
Emergent Multi-View Fidelity in Autonomous UAV Swarm Sport Injury Detection
Accurate, real-time collision detection is essential for ensuring player safety and effective refereeing in high-contact sports such as rugby, particularly given the severe risks associated with traumatic brain injuries (TBI). Traditional collision-monitoring methods employing fixed cameras or wearable sensors face limitations in visibility, coverage, and responsiveness. Previously, we introduced a framework using unmanned aerial vehicles (UAVs) for monitoring and real time kinematics extraction from videos of collision events. In this paper, we show that the strategies operating on the objective of ensuring at least one UAV captures every incident on the pitch have an emergent property of fulfilling a stronger key condition for successful kinematics extraction. Namely, they ensure that almost all collisions are captured by multiple drones, establishing multi-view fidelity and redundancy, while not requiring any drone-to-drone communication.
comment: Accepted for 2025 8th International Balkan Conference on Communications and Networking (Balkancom)
On the Existence of Lagrange Multipliers in Distribution Network Reconfiguration Problems
Distribution network reconfiguration (DNR) is an effective approach for optimizing distribution network operation. However, the DNR problem is computationally challenging due to the mixed-integer non-convex nature. One feasible approach for addressing this challenge is to reformulate binary line on/off state variables as (continuous) non-convex constraints, leading to a nonconvex program. Unfortunately, it remains unclear whether this formulation satisfies the Karush-Kuhn-Tucker (KKT) conditions at locally optimal solutions. In this brief, we study the existence of Lagrange multipliers in DNR problems and prove that under mild assumptions, Lagrange multipliers exist for the DNR model at every locally optimal solution almost surely such that the KKT conditions hold.
comment: 8 pages
Joint Parameterization of Hybrid Physics-Based and Machine Learning Li-Ion Battery Model
Electrochemical hybrid battery models have major potential to enable advanced physics-based control, diagnostic, and prognostic features for next-generation lithium-ion battery management systems. This is due to the physical significance of the electrochemical model, which is complemented by a machine learning model that compensates for output prediction errors caused by system uncertainties. While hybrid models have demonstrated robust output prediction performance under large system uncertainties, they are highly susceptible to the influence of uncertainties during parameter identification, which can compromise the physical significance of the model. To address this challenge, we present a parameter estimation framework that explicitly considers system uncertainties through a discrepancy function. The approach also incorporates a downsampling procedure to address the computational barriers associated with large time-series data sets, as are typical in the battery domain. The framework was validated in simulation, yielding several mean parameter estimation errors that were one order of magnitude smaller than those of the conventional least squares approach. While developed for the high-uncertainty, electrochemical hybrid modeling context, the estimation framework is applicable to all models and is presented in a generalized form.
comment: 7 pages, 2 figures, to be published in the 2025 American Control Conference proceedings
Safety-Critical Formation Control of Non-Holonomic Multi-Robot Systems in Communication-Limited Environments
This paper introduces a decentralized estimator-based safety-critical controller designed for formation control of non-holonomic mobile robots operating in communication-constrained environments. The proposed framework integrates a robust state estimator capable of accurately reconstructing neighboring agents' velocity vectors and orientations under varying dynamic conditions, with a decentralized formation tracking controller that leverages Control Barrier Functions (CBFs) to guarantee collision avoidance and inter-agent safety. We present a closed-form control law that ensures both stability and string stability, effectively attenuating disturbances propagating from leader to followers. The theoretical foundations of the estimator and controller are established using Lyapunov stability analysis, which confirms global asymptotic stability under constant velocities and global uniformly ultimate boundedness under time-varying conditions. Extensive numerical simulations and realistic Gazebo-based experiments validate the effectiveness, robustness, and practical applicability of the proposed method, demonstrating precise formation tracking, stringent safety maintenance, and disturbance resilience without relying on inter-robot communication.
comment: Under review. Video demonstration: https://vimeo.com/1075016147/41612f2f8c
Application of a Novel Model Reduction Technique to the Assessment of Boundedness/Stability of Some Delay Time-Varying Vector Nonlinear Systems
Assessing the boundedness and stability of vector nonlinear systems with variable delays and coefficients remains a challenging problem with broad applications in science and engineering. Existing methods tend to produce overly conservative criteria that offer limited practical value and often fail to explicitly characterize the temporal evolution of solution norms. This paper presents a novel framework for evaluating the evolution of solution norms in such systems. This approach constructs scalar counterparts for the original vector equations. We prove that the solutions to these scalar nonlinear equations, which also include delays and variable coefficients, provide upper bounds for the norms of the original solutions, if the history functions for both equations are properly matched. This reduction enables the evaluation of the boundedness and stability of vector systems through the analysis of the dynamics of their scalar counterparts, which can be performed via straightforward simulations or simplified analytical reasoning. Consequently, we introduce new criteria for boundedness and stability and estimate the radii of the balls containing history functions that yield bounded or stable solutions for the original vector systems. Finally, we validated our inferences through representative simulations that also assessed the accuracy of the proposed approach.
comment: 18 pages
Estimation-Aware Trajectory Optimization with Set-Valued Measurement Uncertainties
In this paper, an optimization-based framework for generating estimation-aware trajectories is presented. In this setup, measurement (output) uncertainties are state-dependent and set-valued. Enveloping ellipsoids are employed to characterize state-dependent uncertainties with unknown distributions. The concept of regularity for set-valued output maps is then introduced, facilitating the formulation of the estimation-aware trajectory generation problem. Specifically, it is demonstrated that for output-regular maps, one can utilize a set-valued observability measure that is concave with respect to the finite horizon state trajectories. By maximizing this measure, estimation-aware trajectories can then be synthesized for a broad class of systems. Trajectory planning routines are also examined in this work, by which the observability measure is optimized for systems with locally linearized dynamics. To illustrate the effectiveness of the proposed approach, representative examples in the context of trajectory planning with vision-based estimation are presented. Moreover, the paper presents estimation-aware planning for an uncooperative Target-Rendezvous problem, where an Ego-satellite employs an onboard machine learning (ML)-based estimation module to realize the rendezvous trajectory.
comment: 40 pages, 9 figures
Data-Driven Yet Formal Policy Synthesis for Stochastic Nonlinear Dynamical Systems
The automated synthesis of control policies for stochastic dynamical systems presents significant challenges. A standard approach is to construct a finite-state abstraction of the continuous system, typically represented as a Markov decision process (MDP). However, generating abstractions is challenging when (1) the system's dynamics are nonlinear, and/or (2) we do not have complete knowledge of the dynamics. In this work, we introduce a novel data-driven abstraction technique for nonlinear Lipschitz continuous dynamical systems with additive stochastic noise that addresses both of these issues. As a key step, we use samples of the dynamics to learn the enabled actions and transition probabilities of the abstraction. We represent abstractions as MDPs with intervals of transition probabilities, known as interval MDPs (IMDPs). These abstractions enable the synthesis of policies for the concrete nonlinear system, with probably approximately correct (PAC) guarantees on the probability of satisfying a specified control objective. Our numerical experiments illustrate the effectiveness and robustness of our approach in achieving reliable control under uncertainty.
A new framework for constrained optimization via feedback control of Lagrange multipliers
The continuous-time analysis of existing iterative algorithms for optimization has a long history. This work proposes a novel continuous-time control-theoretic framework for equality-constrained optimization. The key idea is to design a feedback control system where the Lagrange multipliers are the control input, and the output represents the constraints. The system converges to a stationary point of the constrained optimization problem through suitable regulation. Regarding the Lagrange multipliers, we consider two control laws: proportional-integral control and feedback linearization. These choices give rise to a family of different methods. We rigorously develop the related algorithms, theoretically analyze their convergence and present several numerical experiments to support their effectiveness concerning the state-of-the-art approaches.
comment: submitted for possible publication to IEEE Transactions on Automatic Control
Controlling Peak Sharpness in Multimodal Biomolecular Systems via the Chemical Fokker-Planck Equation
Intracellular biomolecular systems exhibit intrinsic stochasticity due to low molecular copy numbers, leading to multimodal probability distributions that play a crucial role in probabilistic differentiation and cellular decision-making. Controlling the dispersion of multimodal probability distributions in biomolecular systems is critical for regulating stochastic behavior, robustness, and adaptability. However, modifying system parameters to adjust dispersion often affects peak positions, potentially altering a desired phenotype or even fundamental behavior in a genetic pathway. In this paper, we establish a theoretical framework that enables independent control of dispersion while preserving peak positions and modality using the Chemical Fokker-Planck Equation (CFPE) and sharpness, a measure of probability concentration around individual peaks. By analyzing the steady-state solution of the CFPE, we derive explicit conditions under which peak sharpness can be tuned monotonically without changing peak positions or modality. We validate our approach through Monte Carlo simulations on a bimodal chemical system, demonstrating effective dispersion control while maintaining structural stability. This framework provides a systematic approach for designing biomolecular systems with tunable stochastic properties, contributing to advancements in synthetic biology and probabilistic cellular regulation.
Multiagent Systems
Robust Multi-Agent Decision-Making in Finite-Population Games
We study the robustness of an agent decision-making model in finite-population games, with a particular focus on the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model. Specifically, we examine how the model's parameters influence the effects of various sources of noise and modeling inaccuracies -- factors commonly encountered in engineering applications of population games -- on agents' decision-making. Our analysis provides insights into how these parameters can be effectively tuned to mitigate such effects. Theoretical results are supported by numerical examples and simulation studies that validate the analysis and illustrate practical strategies for parameter selection.
Offline Multi-agent Reinforcement Learning via Score Decomposition
Offline multi-agent reinforcement learning (MARL) faces critical challenges due to distributional shifts, further exacerbated by the high dimensionality of joint action spaces and the diversity in coordination strategies and quality among agents. Conventional approaches, including independent learning frameworks and value decomposition methods based on pessimistic principles, remain susceptible to out-of-distribution (OOD) joint actions and often yield suboptimal performance. Through systematic analysis of prevalent offline MARL benchmarks, we identify that this limitation primarily stems from the inherently multimodal nature of joint collaborative policies induced by offline data collection. To address these challenges, we propose a novel two-stage framework: First, we employ a diffusion-based generative model to explicitly capture the complex behavior policy, enabling accurate modeling of diverse multi-agent coordination patterns. Second, we introduce a sequential score function decomposition mechanism to regularize individual policies and enable decentralized execution. Extensive experiments on continuous control tasks demonstrate state-of-the-art performance across multiple standard offline MARL benchmarks, outperforming existing methods by 26.3\% in normalized returns. Our approach provides new insights into offline coordination and equilibrium selection in cooperative multi-agent systems.
comment: Working papers
Assessing the Dynamics of the Coffee Value Chain in Davao del Sur: An Agent-Based Modeling Approach
The study investigates the coffee value chain dynamics in Davao del Sur using an agent-based model. Three main factors driving interactions among key players were identified: trust, risk, and transaction costs. The model was constructed using NetLogo 6.3.0, and data from a survey questionnaire collected three data points from BACOFA members. Five cases were explored, with each scenario simulated 1000 times. Findings suggest that producers often sell to the market rather than the cooperative due to higher prices. However, producers tend to prioritize trust in buyers and their risk attitude, leading to increased sales to the cooperative. The producer's risk attitude significantly influences their decision-making, affecting performance outcomes such as loans, demand, and price changes. All three factors play a role and exert varying impacts on the value chain. So, the stakeholders' decisions on prioritizing factors in improving relationships depend on their priorities. Nonetheless, simulations show that establishing a harmonious system benefiting all parties is possible. However, achieving this requires adjustments to demand, pricing, trust, and risk attitudes of key players, which may not align with the preferences of some parties in reality.
comment: 56 pages, 12 figures, 7 tables
Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models
As AI models scale to billions of parameters and operate with increasing autonomy, ensuring their safe, reliable operation demands engineering-grade security and assurance frameworks. This paper presents an enterprise-level, risk-aware, security-by-design approach for large-scale autonomous AI systems, integrating standardized threat metrics, adversarial hardening techniques, and real-time anomaly detection into every phase of the development lifecycle. We detail a unified pipeline - from design-time risk assessments and secure training protocols to continuous monitoring and automated audit logging - that delivers provable guarantees of model behavior under adversarial and operational stress. Case studies in national security, open-source model governance, and industrial automation demonstrate measurable reductions in vulnerability and compliance overhead. Finally, we advocate cross-sector collaboration - uniting engineering teams, standards bodies, and regulatory agencies - to institutionalize these technical safeguards within a resilient, end-to-end assurance ecosystem for the next generation of AI.
Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation
In real-world multi-agent systems (MASs), observation delays are ubiquitous, preventing agents from making decisions based on the environment's true state. An individual agent's local observation often consists of multiple components from other agents or dynamic entities in the environment. These discrete observation components with varying delay characteristics pose significant challenges for multi-agent reinforcement learning (MARL). In this paper, we first formulate the decentralized stochastic individual delay partially observable Markov decision process (DSID-POMDP) by extending the standard Dec-POMDP. We then propose the Rainbow Delay Compensation (RDC), a MARL training framework for addressing stochastic individual delays, along with recommended implementations for its constituent modules. We implement the DSID-POMDP's observation generation pattern using standard MARL benchmarks, including MPE and SMAC. Experiments demonstrate that baseline MARL methods suffer severe performance degradation under fixed and unfixed delays. The RDC-enhanced approach mitigates this issue, remarkably achieving ideal delay-free performance in certain delay scenarios while maintaining generalizability. Our work provides a novel perspective on multi-agent delayed observation problems and offers an effective solution framework. The source code is available at https://anonymous.4open.science/r/RDC-pymarl-4512/.
comment: The code will be open-sourced in the RDC-pymarl project under https://github.com/linkjoker1006
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Urban causal research is essential for understanding the complex dynamics of cities and informing evidence-based policies. However, it is challenged by the inefficiency and bias of hypothesis generation, barriers to multimodal data complexity, and the methodological fragility of causal experimentation. Recent advances in large language models (LLMs) present an opportunity to rethink how urban causal analysis is conducted. This Perspective examines current urban causal research by analyzing taxonomies that categorize research topics, data sources, and methodological approaches to identify structural gaps. We then introduce an LLM-driven conceptual framework, AutoUrbanCI, composed of four distinct modular agents responsible for hypothesis generation, data engineering, experiment design and execution, and results interpretation with policy recommendations. We propose evaluation criteria for rigor and transparency and reflect on implications for human-AI collaboration, equity, and accountability. We call for a new research agenda that embraces AI-augmented workflows not as replacements for human expertise but as tools to broaden participation, improve reproducibility, and unlock more inclusive forms of urban causal reasoning.
Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles
For solving zero-sum games involving non-transitivity, a useful approach is to maintain a policy population to approximate the Nash Equilibrium (NE). Previous studies have shown that the Policy Space Response Oracles (PSRO) algorithm is an effective framework for solving such games. However, current methods initialize a new policy from scratch or inherit a single historical policy in Best Response (BR), missing the opportunity to leverage past policies to generate a better BR. In this paper, we propose Fusion-PSRO, which employs Nash Policy Fusion to initialize a new policy for BR training. Nash Policy Fusion serves as an implicit guiding policy that starts exploration on the current Meta-NE, thus providing a closer approximation to BR. Moreover, it insightfully captures a weighted moving average of past policies, dynamically adjusting these weights based on the Meta-NE in each iteration. This cumulative process further enhances the policy population. Empirical results on classic benchmarks show that Fusion-PSRO achieves lower exploitability, thereby mitigating the shortcomings of previous research on policy initialization in BR.
comment: 11 pages, 11 figures
AVA: Attentive VLM Agent for Mastering StarCraft II
We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience. Traditional frameworks such as SMAC rely on abstract state representations that diverge significantly from human perception, limiting the ecological validity of agent behavior. Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay. The AVA architecture consists of three integrated components: (1) a vision-language model enhanced with specialized self-attention mechanisms for strategic unit targeting and battlefield assessment, (2) a retrieval-augmented generation system that leverages domain-specific StarCraft II knowledge to inform tactical decisions, and (3) a dynamic role-based task distribution system that enables coordinated multi-agent behavior. The experimental evaluation in our proposed AVACraft environment, which contains 21 multimodal StarCraft II scenarios, demonstrates that AVA powered by foundation models (specifically Qwen-VL and GPT-4o) can execute complex tactical maneuvers without explicit training, achieving comparable performance to traditional MARL methods that require substantial training iterations. This work establishes a foundation for developing human-aligned StarCraft II agents and advances the broader research agenda of multimodal game AI. Our implementation is available at https://github.com/camel-ai/VLM-Play-StarCraft2.
comment: Under Review
The Hidden Bloat in Machine Learning Systems
Software bloat refers to code and features that is not used by a software during runtime. For Machine Learning (ML) systems, bloat is a major contributor to their technical debt leading to decreased performance and resource wastage. In this work, we present, Negativa-ML, a novel tool to identify and remove bloat in ML frameworks by analyzing their shared libraries. Our approach includes novel techniques to detect and locate unnecessary code within device code - a key area overlooked by existing research, which focuses primarily on host code. We evaluate Negativa-ML using four popular ML frameworks across ten workloads over 300 shared libraries. The results demonstrate that the ML frameworks are highly bloated on both the device and host code side. On average, Negativa-ML reduces the device code size in these frameworks by up to 75% and the host code by up to 72%, resulting in total file size reductions of up to 55%. The device code is a primary source of bloat within ML frameworks. Through debloating, we achieve reductions in peak host memory usage, peak GPU memory usage, and execution time by up to 74.6%, 69.6%, and 44.6%, respectively.
Robotics
VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction
Next Best View (NBV) algorithms aim to acquire an optimal set of images using minimal resources, time, or number of captures to enable efficient 3D reconstruction of a scene. Existing approaches often rely on prior scene knowledge or additional image captures and often develop policies that maximize coverage. Yet, for many real scenes with complex geometry and self-occlusions, coverage maximization does not lead to better reconstruction quality directly. In this paper, we propose the View Introspection Network (VIN), which is trained to predict the reconstruction quality improvement of views directly, and the VIN-NBV policy. A greedy sequential sampling-based policy, where at each acquisition step, we sample multiple query views and choose the one with the highest VIN predicted improvement score. We design the VIN to perform 3D-aware featurization of the reconstruction built from prior acquisitions, and for each query view create a feature that can be decoded into an improvement score. We then train the VIN using imitation learning to predict the reconstruction improvement score. We show that VIN-NBV improves reconstruction quality by ~30% over a coverage maximization baseline when operating with constraints on the number of acquisitions or the time in motion.
comment: 19 pages, 11 figures
Let Humanoids Hike! Integrative Skill Development on Complex Trails CVPR 2025
Hiking on complex trails demands balance, agility, and adaptive decision-making over unpredictable terrain. Current humanoid research remains fragmented and inadequate for hiking: locomotion focuses on motor skills without long-term goals or situational awareness, while semantic navigation overlooks real-world embodiment and local terrain variability. We propose training humanoids to hike on complex trails, driving integrative skill development across visual perception, decision making, and motor execution. We develop a learning framework, LEGO-H, that enables a vision-equipped humanoid robot to hike complex trails autonomously. We introduce two technical innovations: 1) A temporal vision transformer variant - tailored into Hierarchical Reinforcement Learning framework - anticipates future local goals to guide movement, seamlessly integrating locomotion with goal-directed navigation. 2) Latent representations of joint movement patterns, combined with hierarchical metric learning - enhance Privileged Learning scheme - enable smooth policy transfer from privileged training to onboard execution. These components allow LEGO-H to handle diverse physical and environmental challenges without relying on predefined motion patterns. Experiments across varied simulated trails and robot morphologies highlight LEGO-H's versatility and robustness, positioning hiking as a compelling testbed for embodied autonomy and LEGO-H as a baseline for future humanoid development.
comment: CVPR 2025. Project page: https://lego-h-humanoidrobothiking.github.io/
Neuro-Symbolic Concepts
This article presents a concept-centric paradigm for building agents that can learn continually and reason flexibly. The concept-centric agent utilizes a vocabulary of neuro-symbolic concepts. These concepts, such as object, relation, and action concepts, are grounded on sensory inputs and actuation outputs. They are also compositional, allowing for the creation of novel concepts through their structural combination. To facilitate learning and reasoning, the concepts are typed and represented using a combination of symbolic programs and neural network representations. Leveraging such neuro-symbolic concepts, the agent can efficiently learn and recombine them to solve various tasks across different domains, ranging from 2D images, videos, 3D scenes, and robotic manipulation tasks. This concept-centric framework offers several advantages, including data efficiency, compositional generalization, continual learning, and zero-shot transfer.
comment: To appear in Communications of the ACM
Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach
Humans make extensive use of haptic exploration to map and identify the properties of the objects that we touch. In robotics, active tactile perception has emerged as an important research domain that complements vision for tasks such as object classification, shape reconstruction, and manipulation. This work introduces TAP (Task-agnostic Active Perception) -- a novel framework that leverages reinforcement learning (RL) and transformer-based architectures to address the challenges posed by partially observable environments. TAP integrates Soft Actor-Critic (SAC) and CrossQ algorithms within a unified optimization objective, jointly training a perception module and decision-making policy. By design, TAP is completely task-agnostic and can, in principle, generalize to any active perception problem. We evaluate TAP across diverse tasks, including toy examples and realistic applications involving haptic exploration of 3D models from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of TAP, achieving high accuracies on the Tactile MNIST haptic digit recognition task and a tactile pose estimation task. These findings underscore the potential of TAP as a versatile and generalizable framework for advancing active tactile perception in robotics.
comment: 16 pages; 13 figures
ELA-ZSON: Efficient Layout-Aware Zero-Shot Object Navigation Agent with Hierarchical Planning
We introduce ELA-ZSON, an efficient layout-aware zero-shot object navigation (ZSON) approach designed for complex multi-room indoor environments. By planning hierarchically leveraging a global topologigal map with layout information and local imperative approach with detailed scene representation memory, ELA-ZSON achieves both efficient and effective navigation. The process is managed by an LLM-powered agent, ensuring seamless effective planning and navigation, without the need for human interaction, complex rewards, or costly training. Our experimental results on the MP3D benchmark achieves 85\% object navigation success rate (SR) and 79\% success rate weighted by path length (SPL) (over 40\% point improvement in SR and 60\% improvement in SPL compared to exsisting methods). Furthermore, we validate the robustness of our approach through virtual agent and real-world robotic deployment, showcasing its capability in practical scenarios. See https://anonymous.4open.science/r/ELA-ZSON-C67E/ for details.
KRRF: Kinodynamic Rapidly-exploring Random Forest algorithm for multi-goal motion planning
The problem of kinodynamic multi-goal motion planning is to find a trajectory over multiple target locations with an apriori unknown sequence of visits. The objective is to minimize the cost of the trajectory planned in a cluttered environment for a robot with a kinodynamic motion model. This problem has yet to be efficiently solved as it combines two NP-hard problems, the Traveling Salesman Problem~(TSP) and the kinodynamic motion planning problem. We propose a novel approximate method called Kinodynamic Rapidly-exploring Random Forest~(KRRF) to find a collision-free multi-goal trajectory that satisfies the motion constraints of the robot. KRRF simultaneously grows kinodynamic trees from all targets towards all other targets while using the other trees as a heuristic to boost the growth. Once the target-to-target trajectories are planned, their cost is used to solve the TSP to find the sequence of targets. The final multi-goal trajectory satisfying kinodynamic constraints is planned by guiding the RRT-based planner along the target-to-target trajectories in the TSP sequence. Compared with existing approaches, KRRF provides shorter target-to-target trajectories and final multi-goal trajectories with $1.1-2$ times lower costs while being computationally faster in most test cases. The method will be published as an open-source library.
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a new framework for learning cross-embodiment vision-language-action (VLA) policies. Our key innovation is to derive task-centric action representations from videos with a latent action model. This enables us to exploit extensive data across a wide spectrum of embodiments and perspectives. To mitigate the effect of task-irrelevant dynamics, we incorporate language instructions and establish a latent action model within the DINO feature space. Learned from internet-scale videos, the generalist policy can be deployed to various robots through efficient latent action decoding. We obtain state-of-the-art results across multiple manipulation and navigation benchmarks, as well as real-robot deployments. UniVLA achieves superior performance over OpenVLA with less than 1/20 of pretraining compute and 1/10 of downstream data. Continuous performance improvements are observed as heterogeneous data, even including human videos, are incorporated into the training pipeline. The results underscore UniVLA's potential to facilitate scalable and efficient robot policy learning.
comment: Accepted to RSS 2025. Code is available at https://github.com/OpenDriveLab/UniVLA
Parameter-Free Segmentation of Robot Movements with Cross-Correlation Using Different Similarity Metrics
Often, robots are asked to execute primitive movements, whether as a single action or in a series of actions representing a larger, more complex task. These movements can be learned in many ways, but a common one is from demonstrations presented to the robot by a teacher. However, these demonstrations are not always simple movements themselves, and complex demonstrations must be broken down, or segmented, into primitive movements. In this work, we present a parameter-free approach to segmentation using techniques inspired by autocorrelation and cross-correlation from signal processing. In cross-correlation, a representative signal is found in some larger, more complex signal by correlating the representative signal with the larger signal. This same idea can be applied to segmenting robot motion and demonstrations, provided with a representative motion primitive. This results in a fast and accurate segmentation, which does not take any parameters. One of the main contributions of this paper is the modification of the cross-correlation process by employing similarity metrics that can capture features specific to robot movements. To validate our framework, we conduct several experiments of complex tasks both in simulation and in real-world. We also evaluate the effectiveness of our segmentation framework by comparing various similarity metrics.
comment: 7 pages, 5 figures. Accepted to UR 2025. Code available at https://github.com/PeARL-robotics/PFCS
Robot Learning Using Multi-Coordinate Elastic Maps
To learn manipulation skills, robots need to understand the features of those skills. An easy way for robots to learn is through Learning from Demonstration (LfD), where the robot learns a skill from an expert demonstrator. While the main features of a skill might be captured in one differential coordinate (i.e., Cartesian), they could have meaning in other coordinates. For example, an important feature of a skill may be its shape or velocity profile, which are difficult to discover in Cartesian differential coordinate. In this work, we present a method which enables robots to learn skills from human demonstrations via encoding these skills into various differential coordinates, then determines the importance of each coordinate to reproduce the skill. We also introduce a modified form of Elastic Maps that includes multiple differential coordinates, combining statistical modeling of skills in these differential coordinate spaces. Elastic Maps, which are flexible and fast to compute, allow for the incorporation of several different types of constraints and the use of any number of demonstrations. Additionally, we propose methods for auto-tuning several parameters associated with the modified Elastic Map formulation. We validate our approach in several simulated experiments and a real-world writing task with a UR5e manipulator arm.
comment: 7 pages, 6 figures. Accepted to UR 2025. Code available at: https://github.com/brenhertel/MC-Elmap, Accompanying video at: https://youtu.be/KU-ldkTa9UE
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations ICRA 2025
Preference feedback collected by human or VLM annotators is often noisy, presenting a significant challenge for preference-based reinforcement learning that relies on accurate preference labels. To address this challenge, we propose TREND, a novel framework that integrates few-shot expert demonstrations with a tri-teaching strategy for effective noise mitigation. Our method trains three reward models simultaneously, where each model views its small-loss preference pairs as useful knowledge and teaches such useful pairs to its peer network for updating the parameters. Remarkably, our approach requires as few as one to three expert demonstrations to achieve high performance. We evaluate TREND on various robotic manipulation tasks, achieving up to 90% success rates even with noise levels as high as 40%, highlighting its effective robustness in handling noisy preference feedback. Project page: https://shuaiyihuang.github.io/publications/TREND.
comment: ICRA 2025
Centralized Decision-Making for Platooning By Using SPaT-Driven Reference Speeds
This paper introduces a centralized approach for fuel-efficient urban platooning by leveraging real-time Vehicle- to-Everything (V2X) communication and Signal Phase and Timing (SPaT) data. A nonlinear Model Predictive Control (MPC) algorithm optimizes the trajectories of platoon leader vehicles, employing an asymmetric cost function to minimize fuel-intensive acceleration. Following vehicles utilize a gap- and velocity-based control strategy, complemented by dynamic platoon splitting logic communicated through Platoon Control Messages (PCM) and Platoon Awareness Messages (PAM). Simulation results obtained from the CARLA environment demonstrate substantial fuel savings of up to 41.2%, along with smoother traffic flows, fewer vehicle stops, and improved intersection throughput.
comment: Accepted for publication at IV 2025
Priority-Driven Safe Model Predictive Control Approach to Autonomous Driving Applications
This paper demonstrates the applicability of the safe model predictive control (SMPC) framework to autonomous driving scenarios, focusing on the design of adaptive cruise control (ACC) and automated lane-change systems. Building on the SMPC approach with priority-driven constraint softening -- which ensures the satisfaction of \emph{hard} constraints under external disturbances by selectively softening a predefined subset of adjustable constraints -- we show how the algorithm dynamically relaxes lower-priority, comfort-related constraints in response to unexpected disturbances while preserving critical safety requirements such as collision avoidance and lane-keeping. A learning-based algorithm approximating the time consuming SMPC is introduced to enable real-time execution. Simulations in real-world driving scenarios subject to unpredicted disturbances confirm that this prioritized softening mechanism consistently upholds stringent safety constraints, underscoring the effectiveness of the proposed method.
comment: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control. arXiv admin note: text overlap with arXiv:2503.15373
Adaptive Robot Localization with Ultra-wideband Novelty Detection
Ultra-wideband (UWB) technology has shown remarkable potential as a low-cost general solution for robot localization. However, limitations of the UWB signal for precise positioning arise from the disturbances caused by the environment itself, due to reflectance, multi-path effect, and Non-Line-of-Sight (NLOS) conditions. This problem is emphasized in cluttered indoor spaces where service robotic platforms usually operate. Both model-based and learning-based methods are currently under investigation to precisely predict the UWB error patterns. Despite the great capability in approximating strong non-linearity, learning-based methods often do not consider environmental factors and require data collection and re-training for unseen data distributions, making them not practically feasible on a large scale. The goal of this research is to develop a robust and adaptive UWB localization method for indoor confined spaces. A novelty detection technique is used to recognize outlier conditions from nominal UWB range data with a semi-supervised autoencoder. Then, the obtained novelty scores are combined with an Extended Kalman filter, leveraging a dynamic estimation of covariance and bias error for each range measurement received from the UWB anchors. The resulting solution is a compact, flexible, and robust system which enables the localization system to adapt the trustworthiness of UWB data spatially and temporally in the environment. The extensive experimentation conducted with a real robot in a wide range of testing scenarios demonstrates the advantages and benefits of the proposed solution in indoor cluttered spaces presenting NLoS conditions, reaching an average improvement of almost 60% and greater than 25cm of absolute positioning error.
Collecting Human Motion Data in Large and Occlusion-Prone Environments using Ultra-Wideband Localization ICRA
With robots increasingly integrating into human environments, understanding and predicting human motion is essential for safe and efficient interactions. Modern human motion and activity prediction approaches require high quality and quantity of data for training and evaluation, usually collected from motion capture systems, onboard or stationary sensors. Setting up these systems is challenging due to the intricate setup of hardware components, extensive calibration procedures, occlusions, and substantial costs. These constraints make deploying such systems in new and large environments difficult and limit their usability for in-the-wild measurements. In this paper we investigate the possibility to apply the novel Ultra-Wideband (UWB) localization technology as a scalable alternative for human motion capture in crowded and occlusion-prone environments. We include additional sensing modalities such as eye-tracking, onboard robot LiDAR and radar sensors, and record motion capture data as ground truth for evaluation and comparison. The environment imitates a museum setup, with up to four active participants navigating toward random goals in a natural way, and offers more than 130 minutes of multi-modal data. Our investigation provides a step toward scalable and accurate motion data collection beyond vision-based systems, laying a foundation for evaluating sensing modalities like UWB in larger and complex environments like warehouses, airports, or convention centers.
comment: accepted for presentation at the 7th Workshop on Long-term Human Motion Prediction (LHMP) at International Conference on Robotics and Automation (ICRA) 2025
Versatile Distributed Maneuvering with Generalized Formations using Guiding Vector Fields
This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding singularity-free guiding vector field (GVF), which, along with a distributed coordination mechanism based on the consensus theory, guides robots to achieve various motions (i.e., versatile maneuvering), including (a) formation tracking, (b) target enclosing, and (c) circumnavigation. Additional motion parameters can generate more complex cooperative robot motions. Based on GVFs, we design a controller for a nonholonomic robot model. Besides the theoretical results, extensive simulations and experiments are performed to validate the effectiveness of the approach.
Augmented Body Communicator: Enhancing daily body expression for people with upper limb limitations through LLM and a robotic arm
Individuals with upper limb movement limitations face challenges in interacting with others. Although robotic arms are currently used primarily for functional tasks, there is considerable potential to explore ways to enhance users' body language capabilities during social interactions. This paper introduces an Augmented Body Communicator system that integrates robotic arms and a large language model. Through the incorporation of kinetic memory, disabled users and their supporters can collaboratively design actions for the robot arm. The LLM system then provides suggestions on the most suitable action based on contextual cues during interactions. The system underwent thorough user testing with six participants who have conditions affecting upper limb mobility. Results indicate that the system improves users' ability to express themselves. Based on our findings, we offer recommendations for developing robotic arms that support disabled individuals with body language capabilities and functional tasks.
Oh F**k! How Do People Feel about Robots that Leverage Profanity?
Profanity is nearly as old as language itself, and cursing has become particularly ubiquitous within the last century. At the same time, robots in personal and service applications are often overly polite, even though past work demonstrates the potential benefits of robot norm-breaking. Thus, we became curious about robots using curse words in error scenarios as a means for improving social perceptions by human users. We investigated this idea using three phases of exploratory work: an online video-based study (N = 76) with a student pool, an online video-based study (N = 98) in the general U.S. population, and an in-person proof-of-concept deployment (N = 52) in a campus space, each of which included the following conditions: no-speech, non-expletive error response, and expletive error response. A surprising result in the outcomes for all three studies was that although verbal acknowledgment of an error was typically beneficial (as expected based on prior work), few significant differences appeared between the non-expletive and expletive error acknowledgment conditions (counter to our expectations). Within the cultural context of our work, the U.S., it seems that many users would likely not mind if robots curse, and may even find it relatable and humorous. This work signals a promising and mischievous design space that challenges typical robot character design.
comment: Under review for the 2025 IEEE RO-MAN Conference
Unsupervised Anomaly Detection for Autonomous Robots via Mahalanobis SVDD with Audio-IMU Fusion
Reliable anomaly detection is essential for ensuring the safety of autonomous robots, particularly when conventional detection systems based on vision or LiDAR become unreliable in adverse or unpredictable conditions. In such scenarios, alternative sensing modalities are needed to provide timely and robust feedback. To this end, we explore the use of audio and inertial measurement unit (IMU) sensors to detect underlying anomalies in autonomous mobile robots, such as collisions and internal mechanical faults. Furthermore, to address the challenge of limited labeled anomaly data, we propose an unsupervised anomaly detection framework based on Mahalanobis Support Vector Data Description (M-SVDD). In contrast to conventional SVDD methods that rely on Euclidean distance and assume isotropic feature distributions, our approach employs the Mahalanobis distance to adaptively scale feature dimensions and capture inter-feature correlations, enabling more expressive decision boundaries. In addition, a reconstruction-based auxiliary branch is introduced to preserve feature diversity and prevent representation collapse, further enhancing the robustness of anomaly detection. Extensive experiments on a collected mobile robot dataset and four public datasets demonstrate the effectiveness of the proposed method, as shown in the video https://youtu.be/yh1tn6DDD4A. Code and dataset are available at https://github.com/jamesyang7/M-SVDD.
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks CVPR 2025
Robotic manipulation in 3D requires learning an $N$ degree-of-freedom joint space trajectory of a robot manipulator. Robots must possess semantic and visual perception abilities to transform real-world mappings of their workspace into the low-level control necessary for object manipulation. Recent work has demonstrated the capabilities of fine-tuning large Vision-Language Models (VLMs) to learn the mapping between RGB images, language instructions, and joint space control. These models typically take as input RGB images of the workspace and language instructions, and are trained on large datasets of teleoperated robot demonstrations. In this work, we explore methods to improve the scene context awareness of a popular recent Vision-Language-Action model by integrating chain-of-thought reasoning, depth perception, and task-oriented region of interest detection. Our experiments in the LIBERO simulation environment show that our proposed model, 3D-CAVLA, improves the success rate across various LIBERO task suites, achieving an average success rate of 98.1$\%$. We also evaluate the zero-shot capabilities of our method, demonstrating that 3D scene awareness leads to robust learning and adaptation for completely unseen tasks. 3D-CAVLA achieves an absolute improvement of 8.8$\%$ on unseen tasks. We will open-source our code and the unseen tasks dataset to promote community-driven research here: https://3d-cavla.github.io
comment: Accepted at the 1st Workshop on 3D LLM/VLA, CVPR 2025
Formation Maneuver Control Based on the Augmented Laplacian Method
This paper proposes a novel formation maneuver control method for both 2-D and 3-D space, which enables the formation to translate, scale, and rotate with arbitrary orientation. The core innovation is the novel design of weights in the proposed augmented Laplacian matrix. Instead of using scalars, we represent weights as matrices, which are designed based on a specified rotation axis and allow the formation to perform rotation in 3-D space. To further improve the flexibility and scalability of the formation, the rotational axis adjustment approach and dynamic agent reconfiguration method are developed, allowing formations to rotate around arbitrary axes in 3-D space and new agents to join the formation. Theoretical analysis is provided to show that the proposed approach preserves the original configuration of the formation. The proposed method maintains the advantages of the complex Laplacian-based method, including reduced neighbor requirements and no reliance on generic or convex nominal configurations, while achieving arbitrary orientation rotations via a more simplified implementation. Simulations in both 2-D and 3-D space validate the effectiveness of the proposed method.
Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives
Diffusion policies have demonstrated remarkable dexterity and robustness in intricate, high-dimensional robot manipulation tasks, while training from a small number of demonstrations. However, the reason for this performance remains a mystery. In this paper, we offer a surprising hypothesis: diffusion policies essentially memorize an action lookup table -- and this is beneficial. We posit that, at runtime, diffusion policies find the closest training image to the test image in a latent space, and recall the associated training action sequence, offering reactivity without the need for action generalization. This is effective in the sparse data regime, where there is not enough data density for the model to learn action generalization. We support this claim with systematic empirical evidence. Even when conditioned on wildly out of distribution (OOD) images of cats and dogs, the Diffusion Policy still outputs an action sequence from the training data. With this insight, we propose a simple policy, the Action Lookup Table (ALT), as a lightweight alternative to the Diffusion Policy. Our ALT policy uses a contrastive image encoder as a hash function to index the closest corresponding training action sequence, explicitly performing the computation that the Diffusion Policy implicitly learns. We show empirically that for relatively small datasets, ALT matches the performance of a diffusion model, while requiring only 0.0034 of the inference time and 0.0085 of the memory footprint, allowing for much faster closed-loop inference with resource constrained robots. We also train our ALT policy to give an explicit OOD flag when the distance between the runtime image is too far in the latent space from the training images, giving a simple but effective runtime monitor. More information can be found at: https://stanfordmsl.github.io/alt/.
Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination ICRA 2025
Recently, many humanoid robots have been increasingly deployed in various facilities, including hospitals and assisted living environments, where they are often remotely controlled by human operators. Their kinematic redundancy enhances reachability and manipulability, enabling them to navigate complex, cluttered environments and perform a wide range of tasks. However, this redundancy also presents significant control challenges, particularly in coordinating the movements of the robot's macro-micro structure (torso and arms). Therefore, we propose various human-robot collaborative (HRC) methods for coordinating the torso and arm of remotely controlled mobile humanoid robots, aiming to balance autonomy and human input to enhance system efficiency and task execution. The proposed methods include human-initiated approaches, where users manually control torso movements, and robot-initiated approaches, which autonomously coordinate torso and arm based on factors such as reachability, task goal, or inferred human intent. We conducted a user study with N=17 participants to compare the proposed approaches in terms of task performance, manipulability, and energy efficiency, and analyzed which methods were preferred by participants.
comment: This work has been accepted for publication in 2025 IEEE International Conference on Robotics and Automation (ICRA 2025). The final published version will be available via IEEE Xplore
Multi-Agent Systems for Robotic Autonomy with LLMs
Since the advent of Large Language Models (LLMs), various research based on such models have maintained significant academic attention and impact, especially in AI and robotics. In this paper, we propose a multi-agent framework with LLMs to construct an integrated system for robotic task analysis, mechanical design, and path generation. The framework includes three core agents: Task Analyst, Robot Designer, and Reinforcement Learning Designer. Outputs are formatted as multimodal results, such as code files or technical reports, for stronger understandability and usability. To evaluate generalizability comparatively, we conducted experiments with models from both GPT and DeepSeek. Results demonstrate that the proposed system can design feasible robots with control strategies when appropriate task inputs are provided, exhibiting substantial potential for enhancing the efficiency and accessibility of robotic system development in research and industrial applications.
comment: 11 pages, 2 figures, 5 tables, submitted for publication
Towards Embodiment Scaling Laws in Robot Locomotion
Developing generalist agents that can operate across diverse tasks, environments, and physical embodiments is a grand challenge in robotics and artificial intelligence. In this work, we focus on the axis of embodiment and investigate embodiment scaling laws$\unicode{x2013}$the hypothesis that increasing the number of training embodiments improves generalization to unseen ones. Using robot locomotion as a test bed, we procedurally generate a dataset of $\sim$1,000 varied embodiments, spanning humanoids, quadrupeds, and hexapods, and train generalist policies capable of handling diverse observation and action spaces on random subsets. We find that increasing the number of training embodiments improves generalization to unseen ones, and scaling embodiments is more effective in enabling embodiment-level generalization than scaling data on small, fixed sets of embodiments. Notably, our best policy, trained on the full dataset, zero-shot transfers to novel embodiments in the real world, such as Unitree Go2 and H1. These results represent a step toward general embodied intelligence, with potential relevance to adaptive control for configurable robots, co-design of morphology and control, and beyond.
comment: 32 pages. Project website: https://embodiment-scaling-laws.github.io/
Automating Infrastructure Surveying: A Framework for Geometric Measurements and Compliance Assessment Using Point Cloud Data
Automation can play a prominent role in improving efficiency, accuracy, and scalability in infrastructure surveying and assessing construction and compliance standards. This paper presents a framework for automation of geometric measurements and compliance assessment using point cloud data. The proposed approach integrates deep learning-based detection and segmentation, in conjunction with geometric and signal processing techniques, to automate surveying tasks. As a proof of concept, we apply this framework to automatically evaluate the compliance of curb ramps with the Americans with Disabilities Act (ADA), demonstrating the utility of point cloud data in survey automation. The method leverages a newly collected, large annotated dataset of curb ramps, made publicly available as part of this work, to facilitate robust model training and evaluation. Experimental results, including comparison with manual field measurements of several ramps, validate the accuracy and reliability of the proposed method, highlighting its potential to significantly reduce manual effort and improve consistency in infrastructure assessment. Beyond ADA compliance, the proposed framework lays the groundwork for broader applications in infrastructure surveying and automated construction evaluation, promoting wider adoption of point cloud data in these domains. The annotated database, manual ramp survey data, and developed algorithms are publicly available on the project's GitHub page: https://github.com/Soltanilara/SurveyAutomation.
comment: 19 pages, 15 figures, 4 tables
Quantitative Hardness Assessment with Vision-based Tactile Sensing for Fruit Classification and Grasping ICRA
Accurate estimation of fruit hardness is essential for automated classification and handling systems, particularly in determining fruit variety, assessing ripeness, and ensuring proper harvesting force. This study presents an innovative framework for quantitative hardness assessment utilizing vision-based tactile sensing, tailored explicitly for robotic applications in agriculture. The proposed methodology derives normal force estimation from a vision-based tactile sensor, and, based on the dynamics of this normal force, calculates the hardness. This approach offers a rapid, non-destructive evaluation through single-contact interaction. The integration of this framework into robotic systems enhances real-time adaptability of grasping forces, thereby reducing the likelihood of fruit damage. Moreover, the general applicability of this approach, through a universal criterion based on average normal force dynamics, ensures its effectiveness across a wide variety of fruit types and sizes. Extensive experimental validation conducted across different fruit types and ripeness-tracking studies demonstrates the efficacy and robustness of the framework, marking a significant advancement in the domain of automated fruit handling.
comment: Accepted to the Novel Approaches for Precision Agriculture and Forestry with Autonomous Robots IEEE ICRA Workshop - 2025
Physics-informed Temporal Difference Metric Learning for Robot Motion Planning ICLR 2025
The motion planning problem involves finding a collision-free path from a robot's starting to its target configuration. Recently, self-supervised learning methods have emerged to tackle motion planning problems without requiring expensive expert demonstrations. They solve the Eikonal equation for training neural networks and lead to efficient solutions. However, these methods struggle in complex environments because they fail to maintain key properties of the Eikonal equation, such as optimal value functions and geodesic distances. To overcome these limitations, we propose a novel self-supervised temporal difference metric learning approach that solves the Eikonal equation more accurately and enhances performance in solving complex and unseen planning tasks. Our method enforces Bellman's principle of optimality over finite regions, using temporal difference learning to avoid spurious local minima while incorporating metric learning to preserve the Eikonal equation's essential geodesic properties. We demonstrate that our approach significantly outperforms existing self-supervised learning methods in handling complex environments and generalizing to unseen environments, with robot configurations ranging from 2 to 12 degrees of freedom (DOF).
comment: Accepted to ICLR 2025
Adaptive Wiping: Adaptive contact-rich manipulation through few-shot imitation learning with Force-Torque feedback and pre-trained object representations
Imitation learning offers a pathway for robots to perform repetitive tasks, allowing humans to focus on more engaging and meaningful activities. However, challenges arise from the need for extensive demonstrations and the disparity between training and real-world environments. This paper focuses on contact-rich tasks like wiping with soft and deformable objects, requiring adaptive force control to handle variations in wiping surface height and the sponge's physical properties. To address these challenges, we propose a novel method that integrates real-time force-torque (FT) feedback with pre-trained object representations. This approach allows robots to dynamically adjust to previously unseen changes in surface heights and sponges' physical properties. In real-world experiments, our method achieved 96% accuracy in applying reference forces, significantly outperforming the previous method that lacked an FT feedback loop, which only achieved 4% accuracy. To evaluate the adaptability of our approach, we conducted experiments under different conditions from the training setup, involving 40 scenarios using 10 sponges with varying physical properties and 4 types of wiping surface heights, demonstrating significant improvements in the robot's adaptability by analyzing force trajectories. The video of our work is available at: https://sites.google.com/view/adaptive-wiping
Autonomous Vision-Based Magnetic Microrobotic Pushing of Micro-Objects and Cells
Accurate and autonomous transportation of micro-objects and biological cells can enable significant advances in a wide variety of research disciplines. Here, we present a novel, vision-based, model-free microrobotic pushing algorithm for the autonomous manipulation of micro objects and biological cells. The algorithm adjusts the axis of a rotating magnetic field that in turn controls the heading angle and spin axis of a spherical Janus rolling microrobot. We introduce the concept of a microrobotic guiding corridor to constrain the object and to avoid pushing failures. We then show that employing only two simple conditions, the microrobot is able to successfully and autonomously push microscale objects along predefined trajectories. We evaluate the performance of the algorithm by measuring the mean absolute error and completion time relative to a desired path at different actuation frequencies and guiding corridor widths. Finally, we demonstrate biomedical applicability by autonomously transporting a single biological cell, highlighting the methods potential for applications in tissue engineering, drug delivery and synthetic biology.
Direct Data Driven Control Using Noisy Measurements
This paper presents a novel direct data-driven control framework for solving the linear quadratic regulator (LQR) under disturbances and noisy state measurements. The system dynamics are assumed unknown, and the LQR solution is learned using only a single trajectory of noisy input-output data while bypassing system identification. Our approach guarantees mean-square stability (MSS) and optimal performance by leveraging convex optimization techniques that incorporate noise statistics directly into the controller synthesis. First, we establish a theoretical result showing that the MSS of an uncertain data-driven system implies the MSS of the true closed-loop system. Building on this, we develop a robust stability condition using linear matrix inequalities (LMIs) that yields a stabilizing controller gain from noisy measurements. Finally, we formulate a data-driven LQR problem as a semidefinite program (SDP) that computes an optimal gain, minimizing the steady-state covariance. Extensive simulations on benchmark systems -- including a rotary inverted pendulum and an active suspension system -- demonstrate the superior robustness and accuracy of our method compared to existing data-driven LQR approaches. The proposed framework offers a practical and theoretically grounded solution for controller design in noise-corrupted environments where system identification is infeasible.
comment: Submitted to IEEE-TAC
Camera Control at the Edge with Language Models for Scene Understanding
In this paper, we present Optimized Prompt-based Unified System (OPUS), a framework that utilizes a Large Language Model (LLM) to control Pan-Tilt-Zoom (PTZ) cameras, providing contextual understanding of natural environments. To achieve this goal, the OPUS system improves cost-effectiveness by generating keywords from a high-level camera control API and transferring knowledge from larger closed-source language models to smaller ones through Supervised Fine-Tuning (SFT) on synthetic data. This enables efficient edge deployment while maintaining performance comparable to larger models like GPT-4. OPUS enhances environmental awareness by converting data from multiple cameras into textual descriptions for language models, eliminating the need for specialized sensory tokens. In benchmark testing, our approach significantly outperformed both traditional language model techniques and more complex prompting methods, achieving a 35% improvement over advanced techniques and a 20% higher task accuracy compared to closed-source models like Gemini Pro. The system demonstrates OPUS's capability to simplify PTZ camera operations through an intuitive natural language interface. This approach eliminates the need for explicit programming and provides a conversational method for interacting with camera systems, representing a significant advancement in how users can control and utilize PTZ camera technology.
comment: 7 pages, 6 figures. This work was presented and published at the 11th IEEE International Conference on Control, Automation and Robotics (ICCAR) in 2025
LLM-Land: Large Language Models for Context-Aware Drone Landing
Autonomous landing is essential for drones deployed in emergency deliveries, post-disaster response, and other large-scale missions. By enabling self-docking on charging platforms, it facilitates continuous operation and significantly extends mission endurance. However, traditional approaches often fall short in dynamic, unstructured environments due to limited semantic awareness and reliance on fixed, context-insensitive safety margins. To address these limitations, we propose a hybrid framework that integrates large language model (LLMs) with model predictive control (MPC). Our approach begins with a vision-language encoder (VLE) (e.g., BLIP), which transforms real-time images into concise textual scene descriptions. These descriptions are processed by a lightweight LLM (e.g., Qwen 2.5 1.5B or LLaMA 3.2 1B) equipped with retrieval-augmented generation (RAG) to classify scene elements and infer context-aware safety buffers, such as 3 meters for pedestrians and 5 meters for vehicles. The resulting semantic flags and unsafe regions are then fed into an MPC module, enabling real-time trajectory replanning that avoids collisions while maintaining high landing precision. We validate our framework in the ROS-Gazebo simulator, where it consistently outperforms conventional vision-based MPC baselines. Our results show a significant reduction in near-miss incidents with dynamic obstacles, while preserving accurate landings in cluttered environments.
Learning Sequential Kinematic Models from Demonstrations for Multi-Jointed Articulated Objects
As robots become more generalized and deployed in diverse environments, they must interact with complex objects, many with multiple independent joints or degrees of freedom (DoF) requiring precise control. A common strategy is object modeling, where compact state-space models are learned from real-world observations and paired with classical planning. However, existing methods often rely on prior knowledge or focus on single-DoF objects, limiting their applicability. They also fail to handle occluded joints and ignore the manipulation sequences needed to access them. We address this by learning object models from human demonstrations. We introduce Object Kinematic Sequence Machines (OKSMs), a novel representation capturing both kinematic constraints and manipulation order for multi-DoF objects. To estimate these models from point cloud data, we present Pokenet, a deep neural network trained on human demonstrations. We validate our approach on 8,000 simulated and 1,600 real-world annotated samples. Pokenet improves joint axis and state estimation by over 20 percent on real-world data compared to prior methods. Finally, we demonstrate OKSMs on a Sawyer robot using inverse kinematics-based planning to manipulate multi-DoF objects.
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Preference-based Reinforcement Learning (PbRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human preferences, PbRL suffers from low query efficiency, as policy bias limits trajectory diversity and reduces the number of discriminable queries available for learning preferences. This paper identifies preference discriminability, which quantifies how easily a human can judge which trajectory is closer to their ideal behavior, as a key metric for improving query efficiency. To address this, we move beyond comparisons within a single policy and instead generate queries by comparing trajectories from multiple policies, as training them from scratch promotes diversity without policy bias. We propose Discriminability-Aware Policy-to-Policy Preference-Based Efficient Reinforcement Learning (DAPPER), which integrates preference discriminability with trajectory diversification achieved by multiple policies. DAPPER trains new policies from scratch after each reward update and employs a discriminator that learns to estimate preference discriminability, enabling the prioritized sampling of more discriminable queries. During training, it jointly maximizes the preference reward and preference discriminability score, encouraging the discovery of highly rewarding and easily distinguishable policies. Experiments in simulated and real-world legged robot environments demonstrate that DAPPER outperforms previous methods in query efficiency, particularly under challenging preference discriminability conditions.
"Set It Up!": Functional Object Arrangement with Compositional Generative Models
This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as "set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as "put object A on the table." We introduce a framework, SetItUp, for learning to interpret under-specified instructions. SetItUp takes a small number of training examples and a human-crafted program sketch to uncover arrangement rules for specific scene types. By leveraging an intermediate graph-like representation of abstract spatial relationships among objects, SetItUp decomposes the arrangement problem into two subproblems: i) learning the arrangement patterns from limited data and ii) grounding these abstract relationships into object poses. SetItUp leverages large language models (LLMs) to propose the abstract spatial relationships among objects in novel scenes as the constraints to be satisfied; then, it composes a library of diffusion models associated with these abstract relationships to find object poses that satisfy the constraints. We validate our framework on a dataset comprising study desks, dining tables, and coffee tables, with the results showing superior performance in generating physically plausible, functional, and aesthetically pleasing object arrangements compared to existing models.
comment: 10 pages main paper, 21 pages appendix, RSS 2024
Learning-based adaption of robotic friction models
In the Fourth Industrial Revolution, wherein artificial intelligence and the automation of machines occupy a central role, the deployment of robots is indispensable. However, the manufacturing process using robots, especially in collaboration with humans, is highly intricate. In particular, modeling the friction torque in robotic joints is a longstanding problem due to the lack of a good mathematical description. This motivates the usage of data-driven methods in recent works. However, model-based and data-driven models often exhibit limitations in their ability to generalize beyond the specific dynamics they were trained on, as we demonstrate in this paper. To address this challenge, we introduce a novel approach based on residual learning, which aims to adapt an existing friction model to new dynamics using as little data as possible. We validate our approach by training a base neural network on a symmetric friction data set to learn an accurate relation between the velocity and the friction torque. Subsequently, to adapt to more complex asymmetric settings, we train a second network on a small dataset, focusing on predicting the residual of the initial network's output. By combining the output of both networks in a suitable manner, our proposed estimator outperforms the conventional model-based approach, an extended LuGre model, and the base neural network significantly. Furthermore, we evaluate our method on trajectories involving external loads and still observe a substantial improvement, approximately 60-70%, over the conventional approach. Our method does not rely on data with external load during training, eliminating the need for external torque sensors. This demonstrates the generalization capability of our approach, even with a small amount of data--less than a minute--enabling adaptation to diverse scenarios based on prior knowledge about friction in different settings.
Drive in Corridors: Enhancing the Safety of End-to-end Autonomous Driving via Corridor Learning and Planning
Safety remains one of the most critical challenges in autonomous driving systems. In recent years, the end-to-end driving has shown great promise in advancing vehicle autonomy in a scalable manner. However, existing approaches often face safety risks due to the lack of explicit behavior constraints. To address this issue, we uncover a new paradigm by introducing the corridor as the intermediate representation. Widely adopted in robotics planning, the corridors represents spatio-temporal obstacle-free zones for the vehicle to traverse. To ensure accurate corridor prediction in diverse traffic scenarios, we develop a comprehensive learning pipeline including data annotation, architecture refinement and loss formulation. The predicted corridor is further integrated as the constraint in a trajectory optimization process. By extending the differentiability of the optimization, we enable the optimized trajectory to be seamlessly trained within the end-to-end learning framework, improving both safety and interpretability. Experimental results on the nuScenes dataset demonstrate state-of-the-art performance of our approach, showing a 66.7% reduction in collisions with agents and a 46.5% reduction with curbs, significantly enhancing the safety of end-to-end driving. Additionally, incorporating the corridor contributes to higher success rates in closed-loop evaluations. Project page: https://zhiwei-pg.github.io/Drive-in-Corridors.
comment: 8 pages, 4 figures, accepted by RA-L
Deep hybrid models: infer and plan in a dynamic world
To determine an optimal plan for complex tasks, one often deals with dynamic and hierarchical relationships between several entities. Traditionally, such problems are tackled with optimal control, which relies on the optimization of cost functions; instead, a recent biologically-motivated proposal casts planning and control as an inference process. Active inference assumes that action and perception are two complementary aspects of life whereby the role of the former is to fulfill the predictions inferred by the latter. Here, we present an active inference approach that exploits discrete and continuous processing, based on three features: the representation of potential body configurations in relation to the objects of interest; the use of hierarchical relationships that enable the agent to easily interpret and flexibly expand its body schema for tool use; the definition of potential trajectories related to the agent's intentions, used to infer and plan with dynamic elements at different temporal scales. We evaluate this deep hybrid model on a habitual task: reaching a moving object after having picked a moving tool. We show that the model can tackle the presented task under different conditions. This study extends past work on planning as inference and advances an alternative direction to optimal control.
Exact Imposition of Safety Boundary Conditions in Neural Reachable Tubes ICRA 2025
Hamilton-Jacobi (HJ) reachability analysis is a widely adopted verification tool to provide safety and performance guarantees for autonomous systems. However, it involves solving a partial differential equation (PDE) to compute a safety value function, whose computational and memory complexity scales exponentially with the state dimension, making its direct application to large-scale systems intractable. To overcome these challenges, DeepReach, a recently proposed learning-based approach, approximates high-dimensional reachable tubes using neural networks (NNs). While shown to be effective, the accuracy of the learned solution decreases with system complexity. One of the reasons for this degradation is a soft imposition of safety constraints during the learning process, which corresponds to the boundary conditions of the PDE, resulting in inaccurate value functions. In this work, we propose ExactBC, a variant of DeepReach that imposes safety constraints exactly during the learning process by restructuring the overall value function as a weighted sum of the boundary condition and the NN output. Moreover, the proposed variant no longer needs a boundary loss term during the training process, thus eliminating the need to balance different loss terms. We demonstrate the efficacy of the proposed approach in significantly improving the accuracy of the learned value function for four challenging reachability tasks: a rimless wheel system with state resets, collision avoidance in a cluttered environment, autonomous rocket landing, and multi-aircraft collision avoidance.
comment: First two authors have contributed equally. 7 Pages, 3 figures. Accepted at ICRA 2025
QuickGrasp: Lightweight Antipodal Grasp Planning with Point Clouds
Grasping has been a long-standing challenge in facilitating the final interface between a robot and the environment. As environments and tasks become complicated, the need to embed higher intelligence to infer from the surroundings and act on them has become necessary. Although most methods utilize techniques to estimate grasp pose by treating the problem via pure sampling-based approaches in the six-degree-of-freedom space or as a learning problem, they usually fail in real-life settings owing to poor generalization across domains. In addition, the time taken to generate the grasp plan and the lack of repeatability, owing to sampling inefficiency and the probabilistic nature of existing grasp planning approaches, severely limits their application in real-world tasks. This paper presents a lightweight analytical approach towards robotic grasp planning, particularly antipodal grasps, with little to no sampling in the six-degree-of-freedom space. The proposed grasp planning algorithm is formulated as an optimization problem towards estimating grasp points on the object surface instead of directly estimating the end-effector pose. To this extent, a soft-region-growing algorithm is presented for effective plane segmentation, even in the case of curved surfaces. An optimization-based quality metric is then used for the evaluation of grasp points to ensure indirect force closure. The proposed grasp framework is compared with the existing state-of-the-art grasp planning approach, Grasp pose detection (GPD), as a baseline over multiple simulated objects. The effectiveness of the proposed approach in comparison to GPD is also evaluated in a real-world setting using image and point-cloud data, with the planned grasps being executed using a ROBOTIQ gripper and UR5 manipulator.
A High Efficient and Scalable Obstacle-Avoiding VLSI Global Routing Flow
Routing is a crucial step in the VLSI design flow. With the advancement of manufacturing technologies, more constraints have emerged in design rules, particularly regarding obstacles during routing, leading to increased routing complexity. Unfortunately, many global routers struggle to efficiently generate obstacle-free solutions due to the lack of scalable obstacle-avoiding tree generation methods and the capability of handling modern designs with complex obstacles and nets. In this work, we propose an efficient obstacle-aware global routing flow for VLSI designs with obstacles. The flow includes a rule-based obstacle-avoiding rectilinear Steiner minimal tree (OARSMT) algorithm during the tree generation phase. This algorithm is both scalable and fast to provide tree topologies avoiding obstacles in the early stage globally. With its guidance, OARSMT-guided and obstacle-aware sparse maze routing are proposed in the later stages to minimize obstacle violations further and reduce overflow costs. Compared to advanced methods on the benchmark with obstacles, our approach successfully eliminates obstacle violations, and reduces wirelength and overflow cost, while sacrificing only a limited number of via counts and runtime overhead.
comment: Currently submitting to a journal. Fixed the misaligned numbers in the result table of the previous version
Embedded Hierarchical MPC for Autonomous Navigation
To efficiently deploy robotic systems in society, mobile robots must move autonomously and safely through complex environments. Nonlinear model predictive control (MPC) methods provide a natural way to find a dynamically feasible trajectory through the environment without colliding with nearby obstacles. However, the limited computation power available on typical embedded robotic systems, such as quadrotors, poses a challenge to running MPC in real time, including its most expensive tasks: constraints generation and optimization. To address this problem, we propose a novel hierarchical MPC scheme that consists of a planning and a tracking layer. The planner constructs a trajectory with a long prediction horizon at a slow rate, while the tracker ensures trajectory tracking at a relatively fast rate. We prove that the proposed framework avoids collisions and is recursively feasible. Furthermore, we demonstrate its effectiveness in simulations and lab experiments with a quadrotor that needs to reach a goal position in a complex static environment. The code is efficiently implemented on the quadrotor's embedded computer to ensure real-time feasibility. Compared to a state-of-the-art single-layer MPC formulation, this allows us to increase the planning horizon by a factor of 5, which results in significantly better performance.
comment: 19 pages, 15 figures (excluding biography entries)
3D Hand-Eye Calibration for Collaborative Robot Arm: Look at Robot Base Once
Hand-eye calibration is a common problem in the field of collaborative robotics, involving the determination of the transformation matrix between the visual sensor and the robot flange to enable vision-based robotic tasks. However, this process typically requires multiple movements of the robot arm and an external calibration object, making it both time-consuming and inconvenient, especially in scenarios where frequent recalibration is necessary. In this work, we extend our previous method which eliminates the need for external calibration objects such as a chessboard. We propose a generic dataset generation approach for point cloud registration, focusing on aligning the robot base point cloud with the scanned data. Furthermore, a more detailed simulation study is conducted involving several different collaborative robot arms, followed by real-world experiments in an industrial setting. Our improved method is simulated and evaluated using a total of 14 robotic arms from 9 different brands, including KUKA, Universal Robots, UFACTORY, and Franka Emika, all of which are widely used in the field of collaborative robotics. Physical experiments demonstrate that our extended approach achieves performance comparable to existing commercial hand-eye calibration solutions, while completing the entire calibration procedure in just a few seconds. In addition, we provide a user-friendly hand-eye calibration solution, with the code publicly available at github.com/leihui6/LRBO.
comment: updated
TAPTRv2: Attention-based Position Update Improves Tracking Any Point
In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cost-volume,which contaminates the point query\'s content feature and negatively impacts both visibility prediction and cost-volume computation. In TAPTRv2, we propose a novel attention-based position update (APU) operation and use key-aware deformable attention to realize. For each query, this operation uses key-aware attention weights to combine their corresponding deformable sampling positions to predict a new query position. This design is based on the observation that local attention is essentially the same as cost-volume, both of which are computed by dot-production between a query and its surrounding features. By introducing this new operation, TAPTRv2 not only removes the extra burden of cost-volume computation, but also leads to a substantial performance improvement. TAPTRv2 surpasses TAPTR and achieves state-of-the-art performance on many challenging datasets, demonstrating the superiority
A Learning Framework for Diverse Legged Robot Locomotion Using Barrier-Based Style Rewards ICRA
This work introduces a model-free reinforcement learning framework that enables various modes of motion (quadruped, tripod, or biped) and diverse tasks for legged robot locomotion. We employ a motion-style reward based on a relaxed logarithmic barrier function as a soft constraint, to bias the learning process toward the desired motion style, such as gait, foot clearance, joint position, or body height. The predefined gait cycle is encoded in a flexible manner, facilitating gait adjustments throughout the learning process. Extensive experiments demonstrate that KAIST HOUND, a 45 kg robotic system, can achieve biped, tripod, and quadruped locomotion using the proposed framework; quadrupedal capabilities include traversing uneven terrain, galloping at 4.67 m/s, and overcoming obstacles up to 58 cm (67 cm for HOUND2); bipedal capabilities include running at 3.6 m/s, carrying a 7.5 kg object, and ascending stairs-all performed without exteroceptive input.
comment: 7 pages, Videos at https://youtu.be/JV2_HfTlOKI, IEEE International Conference on Robotics and Automation (ICRA) 2025
RS2AD: End-to-End Autonomous Driving Data Generation from Roadside Sensor Observations
End-to-end autonomous driving solutions, which process multi-modal sensory data to directly generate refined control commands, have become a dominant paradigm in autonomous driving research. However, these approaches predominantly depend on single-vehicle data collection for model training and optimization, resulting in significant challenges such as high data acquisition and annotation costs, the scarcity of critical driving scenarios, and fragmented datasets that impede model generalization. To mitigate these limitations, we introduce RS2AD, a novel framework for reconstructing and synthesizing vehicle-mounted LiDAR data from roadside sensor observations. Specifically, our method transforms roadside LiDAR point clouds into the vehicle-mounted LiDAR coordinate system by leveraging the target vehicle's relative pose. Subsequently, high-fidelity vehicle-mounted LiDAR data is synthesized through virtual LiDAR modeling, point cloud classification, and resampling techniques. To the best of our knowledge, this is the first approach to reconstruct vehicle-mounted LiDAR data from roadside sensor inputs. Extensive experimental evaluations demonstrate that incorporating the data generated by the RS2AD method (the RS2V-L dataset) into model training as a supplement to the KITTI dataset can significantly enhance the accuracy of 3D object detection and greatly improve the efficiency of end-to-end autonomous driving data generation. These findings strongly validate the effectiveness of the proposed method and underscore its potential in reducing dependence on costly vehicle-mounted data collection while improving the robustness of autonomous driving models.
End-to-End Driving via Self-Supervised Imitation Learning Using Camera and LiDAR Data
In autonomous driving, the end-to-end (E2E) driving approach that predicts vehicle control signals directly from sensor data is rapidly gaining attention. To learn a safe E2E driving system, one needs an extensive amount of driving data and human intervention. Vehicle control data is constructed by many hours of human driving, and it is challenging to construct large vehicle control datasets. Often, publicly available driving datasets are collected with limited driving scenes, and collecting vehicle control data is only available by vehicle manufacturers. To address these challenges, this letter proposes the first fully self-supervised learning framework, self-supervised imitation learning (SSIL), for E2E driving, based on the self-supervised regression learning (SSRL) framework.The proposed SSIL framework can learn E2E driving networks \emph{without} using driving command data or a pre-trained model. To construct pseudo steering angle data, proposed SSIL predicts a pseudo target from the vehicle's poses at the current and previous time points that are estimated with light detection and ranging sensors. In addition, we propose two E2E driving networks that predict driving commands depending on high-level instruction. Our numerical experiments with three different benchmark datasets demonstrate that the proposed SSIL framework achieves \emph{very} comparable E2E driving accuracy with the supervised learning counterpart. The proposed pseudo-label predictor outperformed an existing one using proportional integral derivative controller.
comment: 9 pages, 6 figures
Active Contact Engagement for Aerial Navigation in Unknown Environments with Glass
Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent glass obstacles present significant challenges to reliable navigation. Researchers have investigated the use of non-contact sensors and passive contact-resilient aerial vehicle designs to detect glass surfaces, which are often limited in terms of robustness and efficiency. In this work, we propose a novel approach for robust autonomous aerial navigation in unknown environments with transparent glass obstacles, combining the strengths of both sensor-based and contact-based glass detection. The proposed system begins with the incremental detection and information maintenance about potential glass surfaces using visual sensor measurements. The vehicle then actively engages in touch actions with the visually detected potential glass surfaces using a pair of lightweight contact-sensing modules to confirm or invalidate their presence. Following this, the volumetric map is efficiently updated with the glass surface information and safe trajectories are replanned on the fly to circumvent the glass obstacles. We validate the proposed system through real-world experiments in various scenarios, demonstrating its effectiveness in enabling efficient and robust autonomous aerial navigation in complex real-world environments with glass obstacles.
comment: Accepted in the IEEE RA-L. See video at https://youtu.be/AKw-umiDkPU?si=zaNqecdD5sp9m3pZ
Data-Dependent Hidden Markov Model with Off-Road State Determination and Real-Time Viterbi Algorithm for Lane Determination in Autonomous Vehicles
Lane determination and lane sequence determination are important components for many Connected and Automated Vehicle (CAV) applications. Lane determination has been solved using Hidden Markov Model (HMM) among other methods. The existing HMM literature for lane sequence determination uses empirical definitions with user-modified parameters to calculate HMM probabilities. The probability definitions in the literature can cause breaks in the HMM due to the inability to directly calculate probabilities of off-road positions, requiring post-processing of data. This paper develops a time-varying HMM using the physical properties of the roadway and vehicle, and the stochastic properties of the sensors. This approach yields emission and transition probability models conditioned on the sensor data without parameter tuning. It also accounts for the probability that the vehicle is not in any roadway lane (e.g., on the shoulder or making a U-turn), which eliminates the need for post-processing to deal with breaks in the HMM processing. This approach requires adapting the Viterbi algorithm and the HMM to be conditioned on the sensor data, which are then used to generate the most-likely sequence of lanes the vehicle has traveled. The proposed approach achieves an average accuracy of 95.9%. Compared to the existing literature, this provides an average increase of 2.25% by implementing the proposed transition probability and an average increase of 5.1% by implementing both the proposed transition and emission probabilities.
comment: 15 pages, 5 figures, 5 tables, Journal
ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics ICML 2025
Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning (IRL) to balance feature weights and match the demonstrated behaviors optimally. ELEMENTAL also introduces an iterative feedback-loop through self-reflection to improve feature, reward, and policy learning. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks, highlighting its robustness in LfD.
comment: ICML 2025
AsymDex: Asymmetry and Relative Coordinates for RL-based Bimanual Dexterity
We present Asymmetric Dexterity (AsymDex), a novel and simple reinforcement learning (RL) framework that can efficiently learn a large class of bimanual skills in multi-fingered hands without relying on demonstrations. Two crucial insights enable AsymDex to reduce the observation and action space dimensions and improve sample efficiency. First, true ambidexterity is rare in humans and most of us exhibit strong "handedness". Inspired by this observation, we assign complementary roles to each hand: the facilitating hand repositions and reorients one object, while the dominant hand performs complex manipulations to achieve the desired result (e.g., opening a bottle cap, or pouring liquids). Second, controlling the relative motion between the hands is crucial for coordination and synchronization of the two hands. As such, we design relative observation and action spaces and leverage a relative-pose tracking controller. Further, we propose a two-phase decomposition in which AsymDex can be readily integrated with recent advances in grasp learning to facilitate both the acquisition and manipulation of objects using two hands. Unlike existing RL-based methods for bimanual dexterity with multi-fingered hands, which are either sample inefficient or tailored to a specific task, AsymDex can efficiently learn a wide variety of bimanual skills that exhibit asymmetry. Detailed experiments on seven asymmetric bimanual dexterous manipulation tasks (four simulated and three real-world) reveal that AsymDex consistently outperforms strong baselines that challenge our design choices. The project website is at https://sites.google.com/view/asymdex-2025/.
comment: Accepted by CoRL 2024 Workshop WCBM
VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation
The rapid advancement of generative AI and multi-modal foundation models has shown significant potential in advancing robotic manipulation. Vision-language-action (VLA) models, in particular, have emerged as a promising approach for visuomotor control by leveraging large-scale vision-language data and robot demonstrations. However, current VLA models are typically evaluated using a limited set of hand-crafted scenes, leaving their general performance and robustness in diverse scenarios largely unexplored. To address this gap, we present VLATest, a fuzzing framework designed to generate robotic manipulation scenes for testing VLA models. Based on VLATest, we conducted an empirical study to assess the performance of seven representative VLA models. Our study results revealed that current VLA models lack the robustness necessary for practical deployment. Additionally, we investigated the impact of various factors, including the number of confounding objects, lighting conditions, camera poses, unseen objects, and task instruction mutations, on the VLA model's performance. Our findings highlight the limitations of existing VLA models, emphasizing the need for further research to develop reliable and trustworthy VLA applications.
comment: To appear in FSE '25 (Proceedings of ACM Software Engineering, Vol. 2, Issue FSE, Article FSE073), 24 pages, 7 figures
Systems and Control (CS)
Leveraging Multi-Task Learning for Multi-Label Power System Security Assessment
This paper introduces a novel approach to the power system security assessment using Multi-Task Learning (MTL), and reformulating the problem as a multi-label classification task. The proposed MTL framework simultaneously assesses static, voltage, transient, and small-signal stability, improving both accuracy and interpretability with respect to the most state of the art machine learning methods. It consists of a shared encoder and multiple decoders, enabling knowledge transfer between stability tasks. Experiments on the IEEE 68-bus system demonstrate a measurable superior performance of the proposed method compared to the extant state-of-the-art approaches.
Robust Multi-Agent Decision-Making in Finite-Population Games
We study the robustness of an agent decision-making model in finite-population games, with a particular focus on the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model. Specifically, we examine how the model's parameters influence the effects of various sources of noise and modeling inaccuracies -- factors commonly encountered in engineering applications of population games -- on agents' decision-making. Our analysis provides insights into how these parameters can be effectively tuned to mitigate such effects. Theoretical results are supported by numerical examples and simulation studies that validate the analysis and illustrate practical strategies for parameter selection.
ABAMGuid+: An Enhanced Aerocapture Guidance Framework using Augmented Bank Angle Modulation
Aerocapture consists of converting a hyperbolic approach trajectory into a captured target orbit utilizing the aerodynamic forces generated via a single pass through the atmosphere. Aerocapture guidance systems must be robust to significant environmental variations and modeling uncertainty, particularly regarding atmospheric properties and delivery conditions. Recent work has shown that enabling control over both bank angle and angle of attack, a strategy referred to as augmented bank angle modulation (ABAM), can improve robustness to entry state and atmospheric uncertainties. In this work, we derive optimal control solutions for an aerocapture vehicle using ABAM. We first formulate the problem using a linear aerodynamic model and derive closed-form optimal control profiles using Pontryagin's Minimum Principle. To increase modeling fidelity, we also consider a quadratic aerodynamic model and obtain the solution directly using the optimality conditions. Both formulations are solved numerically using Gauss pseudospectral methods (via GPOPS, a software tool for pseudospectral optimal control), to validate the analytic solutions. We then introduce a novel aerocapture guidance algorithm, ABAMGuid+, which indirectly minimizes propellant usage by mimicking the structure of the optimal control solution, enabling efficient guidance by avoiding the complexity of solving the full optimal control problem online. Extensive Monte Carlo simulations of a Uranus aerocapture mission demonstrate that ABAMGuid+ increases capture success rates and reduces post-capture propellant requirements relative to previous methods.
comment: 30 pages, 9 figures, 3 tables. Submitted to the Journal of Guidance, Control, and Dynamics
Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning
This paper addresses the problem of parameter privacy-preserving data sharing in coupled systems, where a data provider shares data with a data user but wants to protect its sensitive parameters. The shared data affects not only the data user's decision-making but also the data provider's operations through system interactions. To trade off control performance and privacy, we propose an interaction-aware privacy-preserving data sharing approach. Our approach generates distorted data by minimizing a combination of (i) mutual information, quantifying privacy leakage of sensitive parameters, and (ii) the impact of distorted data on the data provider's control performance, considering the interactions between stakeholders. The optimization problem is formulated into a Bellman equation and solved by a particle filter reinforcement learning (RL)-based approach. Compared to existing RL-based methods, our formulation significantly reduces history dependency and efficiently handles scenarios with continuous state space. Validated in a mixed-autonomy platoon scenario, our method effectively protects sensitive driving behavior parameters of human-driven vehicles (HDVs) against inference attacks while maintaining negligible impact on fuel efficiency.
comment: 21 pages, 8 figures, accepted at the 7th Annual Learning for Dynamics and Control (L4DC) Conference, 2025
Centralized Decision-Making for Platooning By Using SPaT-Driven Reference Speeds
This paper introduces a centralized approach for fuel-efficient urban platooning by leveraging real-time Vehicle- to-Everything (V2X) communication and Signal Phase and Timing (SPaT) data. A nonlinear Model Predictive Control (MPC) algorithm optimizes the trajectories of platoon leader vehicles, employing an asymmetric cost function to minimize fuel-intensive acceleration. Following vehicles utilize a gap- and velocity-based control strategy, complemented by dynamic platoon splitting logic communicated through Platoon Control Messages (PCM) and Platoon Awareness Messages (PAM). Simulation results obtained from the CARLA environment demonstrate substantial fuel savings of up to 41.2%, along with smoother traffic flows, fewer vehicle stops, and improved intersection throughput.
comment: Accepted for publication at IV 2025
Zero Dynamics Attack Detection and Isolation in Cyber-Physical Systems with Event-triggered Communication
This paper investigates the problem of Zero Dynamics (ZD) cyber-attack detection and isolation in Cyber-Physical Systems (CPS). By utilizing the notion of auxiliary systems with event-based communications, we will develop a detection mechanism capable of detecting and isolating the ZD cyber-attack even when the attackers have full knowledge of the dynamics of the auxiliary system and can launch False Data Injection (FDI) attacks on all the communication channels. More specifically, we will utilize a self-triggering rule for the communication channels connecting the auxiliary system with the Command & Control (C&C) center, leveraging its properties to detect the ZD cyber-attack. Finally, the effectiveness and capabilities of our approach are verified and demonstrated through simulation case studies.
comment: 9 pages, 7 figures
Extending the Control Plane of Container Orchestrators for I/O Virtualization
Single Root Input/Output Virtualization (SR-IOV) is a standard technology for forking a single PCI express device and providing it to applications while ensuring performance isolation. It enables container orchestrators to share a limited number of physical network interfaces without incurring significant virtualization overhead. The allocation of virtualized network devices to containers, however, needs to be more configurable based on the bandwidth needs of running applications. Moreover, container orchestrators' network control over the virtualized interfaces is limited by the abilities of SR-IOV. We explore the design considerations for a system with controlled SR-IOV virtualization and present ConRDMA, a novel architecture that enables fine control of RDMA virtualization for containers. Our evaluation shows that ConRDMA enables containers to use RDMA allocated bandwidth more efficiently and to select best-suited nodes to meet their varying communication requirements.
Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI
Timely and efficient dissemination of service information is critical in compute-first networking systems, where user requests arrive dynamically and computing resources are constrained. In such systems, the access point (AP) plays a key role in forwarding user requests to a server based on its latest received service information. This paper considers a single-source, single-destination system and introduces an Age-and-Value-Aware (AVA) metric that jointly captures both the timeliness and the task relevance of service information. Unlike traditional freshness-based metrics, AVA explicitly incorporates variations in server-side service capacity and AP forwarding decisions, allowing more context-aware update evaluation. Building upon AVA, we propose a reinforcement learning-based update policy that learns to selectively transmit service information updates to the AP. It aims to maximize overall task success while minimizing unnecessary communications. Extensive simulations under diverse user request patterns and varying service capacities demonstrate that AVA reduces the update frequency by over 90% on average compared to baselines, with reductions reaching 98% in certain configurations. Crucially, this reduction is achieved without compromising the accuracy of task execution or the quality of decision making.
comment: 11pages, 40 figures
On the Potential of Electrified Supply Chains to Provide Long Duration Demand Flexibility
Demand flexibility can offset some of the variability introduced on the supply-side by variable renewable generation. However, most efforts (e.g. control of residential vehicle charging) focus on short durations -- typically on the scale of minutes to hours. This paper investigates whether a fully electrified supply chain (transport and manufacturing) could provide demand flexibility over longer durations, exploiting the latency that typically exists between the processing of raw material to the delivery of finished product. Using a case study of the cement industry along the East Coast of the United States, we demonstrate that electrified supply chains could shift gigawatt-hours (GWh) of electricity demand for durations of more than a week, largely following wind power variability. Furthermore, we show that this occurs using low levels of carbon taxing (below $50/tn), at which battery storage is not economically viable. A sensitivity analysis shows potential to provide flexibility in all considered cost scenarios, although where the flexibility comes from can change (e.g. transport vs manufacturing). We show that today's cost of electrified heavy goods vehicles are the most significant parameter -- with substantially lower costs yielding a more demand-flexible supply chain.
Priority-Driven Safe Model Predictive Control Approach to Autonomous Driving Applications
This paper demonstrates the applicability of the safe model predictive control (SMPC) framework to autonomous driving scenarios, focusing on the design of adaptive cruise control (ACC) and automated lane-change systems. Building on the SMPC approach with priority-driven constraint softening -- which ensures the satisfaction of \emph{hard} constraints under external disturbances by selectively softening a predefined subset of adjustable constraints -- we show how the algorithm dynamically relaxes lower-priority, comfort-related constraints in response to unexpected disturbances while preserving critical safety requirements such as collision avoidance and lane-keeping. A learning-based algorithm approximating the time consuming SMPC is introduced to enable real-time execution. Simulations in real-world driving scenarios subject to unpredicted disturbances confirm that this prioritized softening mechanism consistently upholds stringent safety constraints, underscoring the effectiveness of the proposed method.
comment: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control. arXiv admin note: text overlap with arXiv:2503.15373
Design and Application of Energy-saving Sub-Optimal Sliding Mode Control
The recently introduced energy-saving extension of the sub-optimal sliding mode control (SOSMC), which is known in the literature for the last two and half decades, incorporates a control-off mode that allows for saving energy during the finite-time convergence process. This novel energy-saving algorithm (denoted by ES-SOSMC) assumes the systems with relative degree two between the sliding variable and the switching control with a bounded magnitude, while the matched upper-bounded perturbations are not necessarily continuous. The design and practical application of the ES-SOSMC are the subject of this chapter. A method for parameterizing the ES-SOSMC through a constrained minimization of the energy cost function is recalled which guarantees the total energy consumption is lower than that of the conventional SOSMC. Also the residual steady-state oscillations (chattering), occurring when additional (actuator) dynamics are taken into account, are addressed. An application example for scanning and machining a rough surface, both of which require a stiff position control in contact with a moving surface, demonstrates practical suitability of the control. Here, ES-SOSMC is compared with SOSMC by showing an equivalent tracking and stabilization performance and evaluating the energy-saving operation with respect to a fuel consumption norm.
comment: 23 pages, 11 figures
Shortlisting Protection Configurations for HVDC Grids and Electrical Energy Hubs
This paper proposes a methodology for shortlisting protection system configurations for large HVDC switching stations, which are expected in multiterminal HVDC grids and electrical energy hubs (or energy islands). This novel approach focuses on the configuration of protection equipment and the arrangement of lines and converters in various protection zones, instead of expert decisions on protection strategies based on numerous simulations. A graph-based approach that allows high-level evaluation of possible DC fault impacts is presented. This fault impact evaluation method can evaluate many possible protection configurations allowing the selection of less obvious choices, as experts cannot consider all possible configurations, especially when the switching station size increases. A filtering process is applied to reduce the number of possible configurations based on multiple protection performance metrics which are evaluated for different power flow scenarios. The results for these performance metrics can be compared for configurations with different numbers of HVDC circuit breakers to assess the benefit of increasing the amount of protection equipment in different network topologies. It is also shown that, through continued filtering using additional performance metrics or fault scenarios, the number of possible breaker, cable and converter configurations can be further reduced, leading to a protection design that is well suited for many operational scenarios. The results of the shortlisting process provide insights on the required number of HVDC circuit breakers to limit fault impacts to a given value. Moreover, observed trends in the results could, in future studies, contribute to new design principles and priorities, allowing system developers to more effectively design HVDC protection systems for different operational scenarios and possible investment levels.
Integrating Building Thermal Flexibility Into Distribution System: A Privacy-Preserved Dispatch Approach
The inherent thermal storage capacity of buildings brings considerable thermal flexibility to the heating/cooling loads, which are promising demand response resources for power systems. It is widely believed that integrating the thermal flexibility of buildings into the distribution system can improve the operating economy and reliability of the system. However, the private information of the buildings needs to be transferred to the distribution system operator (DSO) to achieve a coordinated optimization, bringing serious privacy concerns to users. Given this issue, we propose a novel privacy-preserved optimal dispatch approach for the distribution system incorporating buildings. Using it, the DSO can exploit the thermal flexibility of buildings without accessing their private information, such as model parameters and indoor temperature profiles. Specifically, we first develop an optimal dispatch model for the distribution system integrating buildings, which can be extended to other storage-like flexibility resources. Second, we reveal that the privacy-preserved integration of buildings is a joint privacy preservation problem for both parameters and state variables and then design a privacy-preserved algorithm based on transformation-based encryption, constraint relaxation, and constraint extension techniques. Besides, we implement a detailed privacy analysis for the proposed method, considering both semi-honest adversaries and external eavesdroppers. Case studies demonstrate the accuracy, privacy-preserved performance, and computational efficiency of the proposed method.
comment: Accepted for publication in IEEE Transactions on Industrial Informatics
Versatile Distributed Maneuvering with Generalized Formations using Guiding Vector Fields
This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding singularity-free guiding vector field (GVF), which, along with a distributed coordination mechanism based on the consensus theory, guides robots to achieve various motions (i.e., versatile maneuvering), including (a) formation tracking, (b) target enclosing, and (c) circumnavigation. Additional motion parameters can generate more complex cooperative robot motions. Based on GVFs, we design a controller for a nonholonomic robot model. Besides the theoretical results, extensive simulations and experiments are performed to validate the effectiveness of the approach.
Automatic Basis Function Selection in Iterative Learning Control: A Sparsity-Promoting Approach Applied to an Industrial Printer
Iterative learning control (ILC) techniques are capable of improving the tracking performance of control systems that repeatedly perform similar tasks by utilizing data from past iterations. The aim of this paper is to design a systematic approach for learning parameterized feedforward signals with limited complexity. The developed method involves an iterative learning control in conjunction with a data-driven sparse subset selection procedure for basis function selection. The ILC algorithm that employs sparse optimization is able to automatically select relevant basis functions and is validated on an industrial flatbed printer.
Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency
Heating, Ventilation, and Air Conditioning (HVAC) systems account for approximately 38% of building energy consumption globally, making them one of the most energy-intensive services. The increasing emphasis on energy efficiency and sustainability, combined with the need for enhanced occupant comfort, presents a significant challenge for traditional HVAC systems. These systems often fail to dynamically adjust to real-time changes in electricity market rates or individual comfort preferences, leading to increased energy costs and reduced comfort. In response, we propose a Human-in-the-Loop (HITL) Artificial Intelligence framework that optimizes HVAC performance by incorporating real-time user feedback and responding to fluctuating electricity prices. Unlike conventional systems that require predefined information about occupancy or comfort levels, our approach learns and adapts based on ongoing user input. By integrating the occupancy prediction model with reinforcement learning, the system improves operational efficiency and reduces energy costs in line with electricity market dynamics, thereby contributing to demand response initiatives. Through simulations, we demonstrate that our method achieves significant cost reductions compared to baseline approaches while maintaining or enhancing occupant comfort. This feedback-driven approach ensures personalized comfort control without the need for predefined settings, offering a scalable solution that balances individual preferences with economic and environmental goals.
comment: ACM e-Energy 2025
Formation Maneuver Control Based on the Augmented Laplacian Method
This paper proposes a novel formation maneuver control method for both 2-D and 3-D space, which enables the formation to translate, scale, and rotate with arbitrary orientation. The core innovation is the novel design of weights in the proposed augmented Laplacian matrix. Instead of using scalars, we represent weights as matrices, which are designed based on a specified rotation axis and allow the formation to perform rotation in 3-D space. To further improve the flexibility and scalability of the formation, the rotational axis adjustment approach and dynamic agent reconfiguration method are developed, allowing formations to rotate around arbitrary axes in 3-D space and new agents to join the formation. Theoretical analysis is provided to show that the proposed approach preserves the original configuration of the formation. The proposed method maintains the advantages of the complex Laplacian-based method, including reduced neighbor requirements and no reliance on generic or convex nominal configurations, while achieving arbitrary orientation rotations via a more simplified implementation. Simulations in both 2-D and 3-D space validate the effectiveness of the proposed method.
DeepSync: A Learning Framework for Pervasive Localization using Code Synchronization on Compressed Cellular Spectrum
Pervasive localization is essential for continuous tracking applications, yet existing solutions face challenges in balancing power consumption and accuracy. GPS, while precise, is impractical for continuous tracking of micro-assets due to high power requirements. Recent advances in non-linear compressed spectrum sensing offer low-power alternatives, but existing implementations achieve only coarse positioning through Received Signal Strength Indicator (RSSI) measurements. We present DeepSync, a deep learning framework that enables precise localization using compressed cellular spectrum. Our key technical insight lies in formulating sub-sample timing estimation as a template matching problem, solved through a novel architecture combining temporal CNN encoders for multi-frame processing with cross-attention mechanisms. The system processes non-linear inter-modulated spectrum through hierarchical feature extraction, achieving robust performance at SNR levels below -10dB -- a regime where conventional timing estimation fails. By integrating real cellular infrastructure data with physics-based ray-tracing simulations, DeepSync achieves 2.128-meter median accuracy while consuming significantly less power than conventional systems. Real-world evaluations demonstrate 10x improvement over existing compressed spectrum approaches, establishing a new paradigm for ultra-low-power localization.
A Feedback Control Framework for Incentivised Suburban Parking Utilisation and Urban Core Traffic Relief
Urban traffic congestion, exacerbated by inefficient parking management and cruising for parking, significantly hampers mobility and sustainability in smart cities. Drivers often face delays searching for parking spaces, influenced by factors such as accessibility, cost, distance, and available services such as charging facilities in the case of electric vehicles. These inefficiencies contribute to increased urban congestion, fuel consumption, and environmental impact. Addressing these challenges, this paper proposes a feedback control incentivisation-based system that aims to better distribute vehicles between city and suburban parking facilities offering park-and-charge/-ride services. Individual driver behaviours are captured via discrete choice models incorporating factors of importance to parking location choice among drivers, such as distance to work, public transport connectivity, charging infrastructure availability, and amount of incentive offered; and are regulated through principles of ergodic control theory. The proposed framework is applied to an electric vehicle park-and-charge/-ride problem, and demonstrates how predictable long-term behaviour of the system can be guaranteed.
Connected C-Core Hybrid SRMs for EV Applications
This paper proposes a new class of permanent magnet-assisted three-phase switched reluctance motors (PM-SRMs) designed to achieve significantly higher torque density for electric vehicle (EV) propulsion systems. Eight distinct motor topologies are systematically investigated, including a non-PM baseline design, three innovative PM arrangement strategies, and two optimized rotor/stator teeth configurations (22-pole and 26-pole variants). The study presents analytical models including magnetic equivalent circuits (MECs), detailed operating principles, and generalized design formulations that account for both electromagnetic and structural considerations. A key contribution is the introduction of the point-of-conversion (PoC) concept, which optimizes PM placement by minimizing magnetic path reluctance. Comparative analysis demonstrates torque density improvements over conventional SRMs and existing PM-assisted designs while maintaining structural robustness. Experimental validation confirms that the proposed 24/22 configuration with inter-phase PMs delivers higher torque per PM volume compared to state-of-the-art designs. The findings provide insights for EV motor designers seeking to balance performance, cost, and reliability.
Robust Management of Airport Security Queues Considering Passenger Non-compliance with Chance-Constrained Optimization
The long waiting time at airport security has become an emergent issue as demand for air travel continues to grow. Not only does queuing at security cause passengers to miss their flights, but also reduce the amount of time passengers spend at the airport post-security, potentially leading to less revenue for the airport operator. One of the key issues to address to reduce waiting time is the management of arrival priority. As passengers on later flights can arrive before passengers on earlier flights, the security system does not always process passengers in the order of the degree of urgency. In this paper, we propose a chance-constrained optimization model that decides in which time slot passengers should be recommended to arrive. We use chance constraints to obtain solutions that take the uncertainty in passenger non-compliance into account. The experimental results, based on a sample day of flight schedules at the Barcelona airport, show a reduction of 85% in the total waiting time. Compared to the deterministic case, in which passengers are assumed to fully comply with the recommendations, we see a 30% increase in the reduction of the total waiting time. This highlights the importance of considering variation in passenger compliance in the management of airport security queues.
Handling Pedestrian Uncertainty in Coordinating Autonomous Vehicles at Signal-Free Intersections
In this paper, we provide a theoretical framework for the coordination of connected and automated vehicles (CAVs) at signal-free intersections, accounting for the unexpected presence of pedestrians. First, we introduce a general vehicle-to-infrastructure communication protocol and a low-level controller that determines the optimal unconstrained trajectories for CAVs, in terms of fuel efficiency and travel time, to cross the intersection without considering pedestrians. If such an unconstrained trajectory is unattainable, we introduce sufficient certificates for each CAV to cross the intersection while respecting the associated constraints. Next, we consider the case where an unexpected pedestrian enters the road. When the CAV's sensors detect a pedestrian, an emergency mode is activated, which imposes certificates related to an unsafe set in the pedestrian's proximity area. Simultaneously, a re-planning mechanism is implemented for all CAVs to accommodate the trajectories of vehicles operating in emergency mode. Finally, we validate the efficacy of our approach through simulations conducted in MATLAB and RoadRunner softwares, which facilitate the integration of sensor tools and the realization of real-time implementation.
Autonomous Vision-Based Magnetic Microrobotic Pushing of Micro-Objects and Cells
Accurate and autonomous transportation of micro-objects and biological cells can enable significant advances in a wide variety of research disciplines. Here, we present a novel, vision-based, model-free microrobotic pushing algorithm for the autonomous manipulation of micro objects and biological cells. The algorithm adjusts the axis of a rotating magnetic field that in turn controls the heading angle and spin axis of a spherical Janus rolling microrobot. We introduce the concept of a microrobotic guiding corridor to constrain the object and to avoid pushing failures. We then show that employing only two simple conditions, the microrobot is able to successfully and autonomously push microscale objects along predefined trajectories. We evaluate the performance of the algorithm by measuring the mean absolute error and completion time relative to a desired path at different actuation frequencies and guiding corridor widths. Finally, we demonstrate biomedical applicability by autonomously transporting a single biological cell, highlighting the methods potential for applications in tissue engineering, drug delivery and synthetic biology.
Development of Reduced Feeder and Load Models Using Practical Topological and Loading Data
Distribution feeder and load model reduction methods are essential for maintaining a good tradeoff between accurate representation of grid behavior and reduced computational complexity in power system studies. An effective algorithm to obtain a reduced order representation of the practical feeders using utility topological and loading data has been presented in this paper. Simulations conducted in this work show that the reduced feeder and load model of a utility feeder, obtained using the proposed method, can accurately capture contactor and motor stalling behaviors for critical events such as fault induced delayed voltage recovery.
Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models
As AI models scale to billions of parameters and operate with increasing autonomy, ensuring their safe, reliable operation demands engineering-grade security and assurance frameworks. This paper presents an enterprise-level, risk-aware, security-by-design approach for large-scale autonomous AI systems, integrating standardized threat metrics, adversarial hardening techniques, and real-time anomaly detection into every phase of the development lifecycle. We detail a unified pipeline - from design-time risk assessments and secure training protocols to continuous monitoring and automated audit logging - that delivers provable guarantees of model behavior under adversarial and operational stress. Case studies in national security, open-source model governance, and industrial automation demonstrate measurable reductions in vulnerability and compliance overhead. Finally, we advocate cross-sector collaboration - uniting engineering teams, standards bodies, and regulatory agencies - to institutionalize these technical safeguards within a resilient, end-to-end assurance ecosystem for the next generation of AI.
Direct Data Driven Control Using Noisy Measurements
This paper presents a novel direct data-driven control framework for solving the linear quadratic regulator (LQR) under disturbances and noisy state measurements. The system dynamics are assumed unknown, and the LQR solution is learned using only a single trajectory of noisy input-output data while bypassing system identification. Our approach guarantees mean-square stability (MSS) and optimal performance by leveraging convex optimization techniques that incorporate noise statistics directly into the controller synthesis. First, we establish a theoretical result showing that the MSS of an uncertain data-driven system implies the MSS of the true closed-loop system. Building on this, we develop a robust stability condition using linear matrix inequalities (LMIs) that yields a stabilizing controller gain from noisy measurements. Finally, we formulate a data-driven LQR problem as a semidefinite program (SDP) that computes an optimal gain, minimizing the steady-state covariance. Extensive simulations on benchmark systems -- including a rotary inverted pendulum and an active suspension system -- demonstrate the superior robustness and accuracy of our method compared to existing data-driven LQR approaches. The proposed framework offers a practical and theoretically grounded solution for controller design in noise-corrupted environments where system identification is infeasible.
comment: Submitted to IEEE-TAC
Reconstructing Brain Causal Dynamics for Subject and Task Fingerprints using fMRI Time-series Data
Purpose: Recently, there has been a revived interest in system neuroscience causation models, driven by their unique capability to unravel complex relationships in multi-scale brain networks. In this paper, we present a novel method that leverages causal dynamics to achieve effective fMRI-based subject and task fingerprinting. Methods: By applying an implicit-explicit discretization scheme, we develop a two-timescale linear state-space model. Through data-driven identification of its parameters, the model captures causal signatures, including directed interactions among brain regions from a spatial perspective, and disentangled fast and slow dynamic modes of brain activity from a temporal perspective. These causal signatures are then integrated with: (i) a modal decomposition and projection method for model-based subject identification, and (ii) a Graph Neural Network (GNN) framework for learning-based task classification. Furthermore, we introduce the concept of the brain reachability landscape as a novel visualization tool, which quantitatively characterizes the maximum possible activation levels of brain regions under various fMRI tasks. Results: We evaluate the proposed approach using the Human Connectome Project dataset and demonstrate its advantage over non-causality-based methods. The obtained causal signatures are visualized and demonstrate clear biological relevance with established understandings of brain function. Conclusion: We verified the feasibility and effectiveness of utilizing brain causal signatures for subject and task fingerprinting. Additionally, our work paves the way for further studies on causal fingerprints with potential applications in both healthy controls and neurodegenerative diseases.
Interval reduced-order switched positive observers for uncertain switched positive linear systems
In this paper, existence conditions and a design procedure of reduced-order switched positive observers for continuous- and discrete-time switched positive linear systems with uncertainty are established. In the analyzed class, arbitrary switching is permitted, whereas the uncertainty expressed via matrix inequalities concerns both the initial state and system parameters. Positive lower and positive upper interval switched observers are obtained. The proposed observers are of (n - p) order, where n is the dimension of the state vector and p is the rank of the output matrix, i.e., p-dimensional measurement information. Moreover, as a special case, existence conditions and a design procedure of reduced-order positive observers for uncertain positive linear systems without switching are provided. The theoretical findings are illustrated by two numerical examples for continuous- and discrete-time systems.
comment: 11 pages, 4 figures
Modelling of a DC-DC Buck Converter Using Long-Short-Term-Memory (LSTM)
Artificial neural networks make it possible to identify black-box models. Based on a recurrent nonlinear autoregressive exogenous neural network, this research provides a technique for simulating the static and dynamic behavior of a DC-DC power converter. This approach employs an algorithm for training a neural network using the inputs and outputs (currents and voltages) of a Buck converter. The technique is validated using simulated data of a realistic Simulink-programmed nonsynchronous Buck converter model and experimental findings. The correctness of the technique is determined by comparing the predicted outputs of the neural network to the actual outputs of the system, thereby confirming the suggested strategy. Simulation findings demonstrate the practicability and precision of the proposed black-box method.
comment: Outdated results
Exact Imposition of Safety Boundary Conditions in Neural Reachable Tubes ICRA 2025
Hamilton-Jacobi (HJ) reachability analysis is a widely adopted verification tool to provide safety and performance guarantees for autonomous systems. However, it involves solving a partial differential equation (PDE) to compute a safety value function, whose computational and memory complexity scales exponentially with the state dimension, making its direct application to large-scale systems intractable. To overcome these challenges, DeepReach, a recently proposed learning-based approach, approximates high-dimensional reachable tubes using neural networks (NNs). While shown to be effective, the accuracy of the learned solution decreases with system complexity. One of the reasons for this degradation is a soft imposition of safety constraints during the learning process, which corresponds to the boundary conditions of the PDE, resulting in inaccurate value functions. In this work, we propose ExactBC, a variant of DeepReach that imposes safety constraints exactly during the learning process by restructuring the overall value function as a weighted sum of the boundary condition and the NN output. Moreover, the proposed variant no longer needs a boundary loss term during the training process, thus eliminating the need to balance different loss terms. We demonstrate the efficacy of the proposed approach in significantly improving the accuracy of the learned value function for four challenging reachability tasks: a rimless wheel system with state resets, collision avoidance in a cluttered environment, autonomous rocket landing, and multi-aircraft collision avoidance.
comment: First two authors have contributed equally. 7 Pages, 3 figures. Accepted at ICRA 2025
GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic Environments
Reinforcement Learning (RL) methods used for solving real-world optimization problems often involve dynamic state-action spaces, larger scale, and sparse rewards, leading to significant challenges in convergence, scalability, and efficient exploration of the solution space. This study introduces GNN-DT, a novel Decision Transformer (DT) architecture that integrates Graph Neural Network (GNN) embedders with a novel residual connection between input and output tokens crucial for handling dynamic environments. By learning from previously collected trajectories, GNN-DT tackles the sparse rewards limitations of online RL algorithms and delivers high-quality solutions in real-time. We evaluate GNN-DT on the complex electric vehicle (EV) charging optimization problem and prove that its performance is superior and requires significantly fewer training trajectories, thus improving sample efficiency compared to existing DT and offline RL baselines. Furthermore, GNN-DT exhibits robust generalization to unseen environments and larger action spaces, addressing a critical gap in prior offline and online RL approaches.
GreenLight-Gym: Reinforcement learning benchmark environment for control of greenhouse production systems
This study presents GreenLight-Gym, a new, fast, open-source benchmark environment for developing reinforcement learning (RL) methods in greenhouse crop production control. Built on the state-of-the-art GreenLight model, it features a differentiable C++ implementation leveraging the CasADi framework for efficient numerical integration. GreenLight-Gym improves simulation speed by a factor of 17 over the original GreenLight implementation. A modular Python environment wrapper enables flexible configuration of control tasks and RL-based controllers. This flexibility is demonstrated by learning controllers under parametric uncertainty using two well-known RL algorithms. GreenLight-Gym provides a standardized benchmark for advancing RL methodologies and evaluating greenhouse control solutions under diverse conditions. The greenhouse control community is encouraged to use and extend this benchmark to accelerate innovation in greenhouse crop production.
comment: This submission replaces our previous pre-print with the version accepted to the 2025 IFAC conference. A new Git repository (https://github.com/BartvLaatum/GreenLight-Gym) accompanies this paper; the repository for the prior version remains live at https://github.com/YourOrg/GreenLightGym. The earlier pre-print is still available on ArXiv under the previous submission number
Parameter Invariance Analysis of Moment Equations Using Dulmage-Mendelsohn Decomposition
Living organisms maintain stable functioning amid environmental fluctuations through homeostasis, a property that preserves a system's behavior despite changes in environmental conditions. To elucidate homeostasis in stochastic biochemical reactions, theoretical tools for assessing population-level invariance under parameter perturbations are crucial. In this paper, we propose a systematic method for identifying the stationary moments that remain invariant under parameter perturbations by leveraging the structural properties of the stationary moment equations. A key step in this development is addressing the underdetermined nature of moment equations, which has traditionally made it difficult to characterize how stationary moments depend on system parameters. To overcome this, we utilize the Dulmage-Mendelsohn (DM) decomposition of the coefficient matrix to extract welldetermined subequations and reveal their hierarchical structure. Leveraging this structure, we identify stationary moments whose partial derivatives with respect to parameters are structurally zero, facilitating the exploration of fundamental constraints that govern homeostatic behavior in stochastic biochemical systems.
A First-Order Gradient Approach for the Connectivity Optimization of Markov Chains
Graphs are commonly used to model various complex systems, including social networks, power grids, transportation networks, and biological systems. In many applications, the connectivity of these networks can be expressed through the Mean First Passage Times (MFPTs) of a Markov chain modeling a random walker on the graph. In this paper, we generalize the network metrics based on Markov chains' MFPTs and extend them to networks affected by uncertainty, in which edges may fail and hence not be present according to a pre-determined stochastic model. To find optimally connected Markov chains, we present a parameterization-free method for optimizing the MFPTs of the Markov chain. More specifically, we present an efficient Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm in the context of Markov chain optimization. The proposed algorithm is suitable for both fixed and random networks. Using various numerical experiments, we demonstrate scalability compared to established benchmarks. Importantly, our algorithm finds an optimal solution without requiring prior knowledge of edge failure probabilities, allowing for an online optimization approach.
comment: 15 pages, 6 figures
Tuning a Cascaded Online Feedback Optimization Controller for Provision of Distributed Flexibility
Coordinating a high number of flexibility providing units (e.g. to provide ancillary services for the transmission system) across various grid layers requires new control concepts. A flexibility request at a point of common coupling can be met by utilizing a cascaded control structure based on online feedback optimization. In this paper the influence of the parameterization of the individual controllers on the performance of the hierarchical flexibility provision is studied on a three-level test system. The results show a high interdependency between the choice of control parameters of one controller and the behavior of other controllers as well as a significant impact on the accuracy and speed of flexibility provision. With a careful tuning, a cascaded structure based on online feedback optimization can achieve efficient vertical coordination of flexibility providing units.
comment: Published in 2024 International Conference on Smart Energy Systems and Technologies (SEST)
Opacity Enforcement by Edit Functions Under Incomparable Observations
As an information-flow privacy property, opacity characterizes whether a malicious external observer (referred to as an intruder) is able to infer the secret behavior of a system. This paper addresses the problem of opacity enforcement using edit functions in discrete event systems modeled by partially observed deterministic finite automata. A defender uses the edit function as an interface at the output of a system to manipulate actual observations through insertion, substitution, and deletion operations so that the intruder will be prevented from inferring the secret behavior of the system. Unlike existing work which usually assumes that the observation capabilities of the intruder and the defender are identical, we consider a more general setting where they may observe incomparable subsets of events generated by the system.To characterize whether the defender has the ability to enforce opacity of the system under this setting, the notion of \emph{$ic$-enforceability} is introduced. Then, the opacity enforcement problem is transformed to a two-player game, with imperfect information between the system and the defender, which can be used to determine a feasible decision-making strategy for the defender. Within the game scheme, an edit mechanism is constructed to enumerate all feasible edit actions following system behavior. We further show that an $ic$-enforcing edit function (if one exists) can be synthesized from the edit mechanism to enforce opacity.
Carbon-Aware Quality Adaptation for Energy-Intensive Services
The energy demand of modern cloud services, particularly those related to generative AI, is increasing at an unprecedented pace. To date, carbon-aware computing strategies have primarily focused on batch process scheduling or geo-distributed load balancing. However, such approaches are not applicable to services that require constant availability at specific locations due to latency, privacy, data, or infrastructure constraints. In this paper, we explore how the carbon footprint of energy-intensive services can be reduced by adjusting the fraction of requests served by different service quality tiers. We show that adapting this quality of responses with respect to grid carbon intensity can lead to additional carbon savings beyond resource and energy efficiency and introduce a forecast-based multi-horizon optimization that reaches close-to-optimal carbon savings.
comment: Published at e-Energy'25
Systems and Control (EESS)
Leveraging Multi-Task Learning for Multi-Label Power System Security Assessment
This paper introduces a novel approach to the power system security assessment using Multi-Task Learning (MTL), and reformulating the problem as a multi-label classification task. The proposed MTL framework simultaneously assesses static, voltage, transient, and small-signal stability, improving both accuracy and interpretability with respect to the most state of the art machine learning methods. It consists of a shared encoder and multiple decoders, enabling knowledge transfer between stability tasks. Experiments on the IEEE 68-bus system demonstrate a measurable superior performance of the proposed method compared to the extant state-of-the-art approaches.
Robust Multi-Agent Decision-Making in Finite-Population Games
We study the robustness of an agent decision-making model in finite-population games, with a particular focus on the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model. Specifically, we examine how the model's parameters influence the effects of various sources of noise and modeling inaccuracies -- factors commonly encountered in engineering applications of population games -- on agents' decision-making. Our analysis provides insights into how these parameters can be effectively tuned to mitigate such effects. Theoretical results are supported by numerical examples and simulation studies that validate the analysis and illustrate practical strategies for parameter selection.
ABAMGuid+: An Enhanced Aerocapture Guidance Framework using Augmented Bank Angle Modulation
Aerocapture consists of converting a hyperbolic approach trajectory into a captured target orbit utilizing the aerodynamic forces generated via a single pass through the atmosphere. Aerocapture guidance systems must be robust to significant environmental variations and modeling uncertainty, particularly regarding atmospheric properties and delivery conditions. Recent work has shown that enabling control over both bank angle and angle of attack, a strategy referred to as augmented bank angle modulation (ABAM), can improve robustness to entry state and atmospheric uncertainties. In this work, we derive optimal control solutions for an aerocapture vehicle using ABAM. We first formulate the problem using a linear aerodynamic model and derive closed-form optimal control profiles using Pontryagin's Minimum Principle. To increase modeling fidelity, we also consider a quadratic aerodynamic model and obtain the solution directly using the optimality conditions. Both formulations are solved numerically using Gauss pseudospectral methods (via GPOPS, a software tool for pseudospectral optimal control), to validate the analytic solutions. We then introduce a novel aerocapture guidance algorithm, ABAMGuid+, which indirectly minimizes propellant usage by mimicking the structure of the optimal control solution, enabling efficient guidance by avoiding the complexity of solving the full optimal control problem online. Extensive Monte Carlo simulations of a Uranus aerocapture mission demonstrate that ABAMGuid+ increases capture success rates and reduces post-capture propellant requirements relative to previous methods.
comment: 30 pages, 9 figures, 3 tables. Submitted to the Journal of Guidance, Control, and Dynamics
Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning
This paper addresses the problem of parameter privacy-preserving data sharing in coupled systems, where a data provider shares data with a data user but wants to protect its sensitive parameters. The shared data affects not only the data user's decision-making but also the data provider's operations through system interactions. To trade off control performance and privacy, we propose an interaction-aware privacy-preserving data sharing approach. Our approach generates distorted data by minimizing a combination of (i) mutual information, quantifying privacy leakage of sensitive parameters, and (ii) the impact of distorted data on the data provider's control performance, considering the interactions between stakeholders. The optimization problem is formulated into a Bellman equation and solved by a particle filter reinforcement learning (RL)-based approach. Compared to existing RL-based methods, our formulation significantly reduces history dependency and efficiently handles scenarios with continuous state space. Validated in a mixed-autonomy platoon scenario, our method effectively protects sensitive driving behavior parameters of human-driven vehicles (HDVs) against inference attacks while maintaining negligible impact on fuel efficiency.
comment: 21 pages, 8 figures, accepted at the 7th Annual Learning for Dynamics and Control (L4DC) Conference, 2025
Centralized Decision-Making for Platooning By Using SPaT-Driven Reference Speeds
This paper introduces a centralized approach for fuel-efficient urban platooning by leveraging real-time Vehicle- to-Everything (V2X) communication and Signal Phase and Timing (SPaT) data. A nonlinear Model Predictive Control (MPC) algorithm optimizes the trajectories of platoon leader vehicles, employing an asymmetric cost function to minimize fuel-intensive acceleration. Following vehicles utilize a gap- and velocity-based control strategy, complemented by dynamic platoon splitting logic communicated through Platoon Control Messages (PCM) and Platoon Awareness Messages (PAM). Simulation results obtained from the CARLA environment demonstrate substantial fuel savings of up to 41.2%, along with smoother traffic flows, fewer vehicle stops, and improved intersection throughput.
comment: Accepted for publication at IV 2025
Zero Dynamics Attack Detection and Isolation in Cyber-Physical Systems with Event-triggered Communication
This paper investigates the problem of Zero Dynamics (ZD) cyber-attack detection and isolation in Cyber-Physical Systems (CPS). By utilizing the notion of auxiliary systems with event-based communications, we will develop a detection mechanism capable of detecting and isolating the ZD cyber-attack even when the attackers have full knowledge of the dynamics of the auxiliary system and can launch False Data Injection (FDI) attacks on all the communication channels. More specifically, we will utilize a self-triggering rule for the communication channels connecting the auxiliary system with the Command & Control (C&C) center, leveraging its properties to detect the ZD cyber-attack. Finally, the effectiveness and capabilities of our approach are verified and demonstrated through simulation case studies.
comment: 9 pages, 7 figures
Extending the Control Plane of Container Orchestrators for I/O Virtualization
Single Root Input/Output Virtualization (SR-IOV) is a standard technology for forking a single PCI express device and providing it to applications while ensuring performance isolation. It enables container orchestrators to share a limited number of physical network interfaces without incurring significant virtualization overhead. The allocation of virtualized network devices to containers, however, needs to be more configurable based on the bandwidth needs of running applications. Moreover, container orchestrators' network control over the virtualized interfaces is limited by the abilities of SR-IOV. We explore the design considerations for a system with controlled SR-IOV virtualization and present ConRDMA, a novel architecture that enables fine control of RDMA virtualization for containers. Our evaluation shows that ConRDMA enables containers to use RDMA allocated bandwidth more efficiently and to select best-suited nodes to meet their varying communication requirements.
Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI
Timely and efficient dissemination of service information is critical in compute-first networking systems, where user requests arrive dynamically and computing resources are constrained. In such systems, the access point (AP) plays a key role in forwarding user requests to a server based on its latest received service information. This paper considers a single-source, single-destination system and introduces an Age-and-Value-Aware (AVA) metric that jointly captures both the timeliness and the task relevance of service information. Unlike traditional freshness-based metrics, AVA explicitly incorporates variations in server-side service capacity and AP forwarding decisions, allowing more context-aware update evaluation. Building upon AVA, we propose a reinforcement learning-based update policy that learns to selectively transmit service information updates to the AP. It aims to maximize overall task success while minimizing unnecessary communications. Extensive simulations under diverse user request patterns and varying service capacities demonstrate that AVA reduces the update frequency by over 90% on average compared to baselines, with reductions reaching 98% in certain configurations. Crucially, this reduction is achieved without compromising the accuracy of task execution or the quality of decision making.
comment: 11pages, 40 figures
On the Potential of Electrified Supply Chains to Provide Long Duration Demand Flexibility
Demand flexibility can offset some of the variability introduced on the supply-side by variable renewable generation. However, most efforts (e.g. control of residential vehicle charging) focus on short durations -- typically on the scale of minutes to hours. This paper investigates whether a fully electrified supply chain (transport and manufacturing) could provide demand flexibility over longer durations, exploiting the latency that typically exists between the processing of raw material to the delivery of finished product. Using a case study of the cement industry along the East Coast of the United States, we demonstrate that electrified supply chains could shift gigawatt-hours (GWh) of electricity demand for durations of more than a week, largely following wind power variability. Furthermore, we show that this occurs using low levels of carbon taxing (below $50/tn), at which battery storage is not economically viable. A sensitivity analysis shows potential to provide flexibility in all considered cost scenarios, although where the flexibility comes from can change (e.g. transport vs manufacturing). We show that today's cost of electrified heavy goods vehicles are the most significant parameter -- with substantially lower costs yielding a more demand-flexible supply chain.
Priority-Driven Safe Model Predictive Control Approach to Autonomous Driving Applications
This paper demonstrates the applicability of the safe model predictive control (SMPC) framework to autonomous driving scenarios, focusing on the design of adaptive cruise control (ACC) and automated lane-change systems. Building on the SMPC approach with priority-driven constraint softening -- which ensures the satisfaction of \emph{hard} constraints under external disturbances by selectively softening a predefined subset of adjustable constraints -- we show how the algorithm dynamically relaxes lower-priority, comfort-related constraints in response to unexpected disturbances while preserving critical safety requirements such as collision avoidance and lane-keeping. A learning-based algorithm approximating the time consuming SMPC is introduced to enable real-time execution. Simulations in real-world driving scenarios subject to unpredicted disturbances confirm that this prioritized softening mechanism consistently upholds stringent safety constraints, underscoring the effectiveness of the proposed method.
comment: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control. arXiv admin note: text overlap with arXiv:2503.15373
Design and Application of Energy-saving Sub-Optimal Sliding Mode Control
The recently introduced energy-saving extension of the sub-optimal sliding mode control (SOSMC), which is known in the literature for the last two and half decades, incorporates a control-off mode that allows for saving energy during the finite-time convergence process. This novel energy-saving algorithm (denoted by ES-SOSMC) assumes the systems with relative degree two between the sliding variable and the switching control with a bounded magnitude, while the matched upper-bounded perturbations are not necessarily continuous. The design and practical application of the ES-SOSMC are the subject of this chapter. A method for parameterizing the ES-SOSMC through a constrained minimization of the energy cost function is recalled which guarantees the total energy consumption is lower than that of the conventional SOSMC. Also the residual steady-state oscillations (chattering), occurring when additional (actuator) dynamics are taken into account, are addressed. An application example for scanning and machining a rough surface, both of which require a stiff position control in contact with a moving surface, demonstrates practical suitability of the control. Here, ES-SOSMC is compared with SOSMC by showing an equivalent tracking and stabilization performance and evaluating the energy-saving operation with respect to a fuel consumption norm.
comment: 23 pages, 11 figures
Shortlisting Protection Configurations for HVDC Grids and Electrical Energy Hubs
This paper proposes a methodology for shortlisting protection system configurations for large HVDC switching stations, which are expected in multiterminal HVDC grids and electrical energy hubs (or energy islands). This novel approach focuses on the configuration of protection equipment and the arrangement of lines and converters in various protection zones, instead of expert decisions on protection strategies based on numerous simulations. A graph-based approach that allows high-level evaluation of possible DC fault impacts is presented. This fault impact evaluation method can evaluate many possible protection configurations allowing the selection of less obvious choices, as experts cannot consider all possible configurations, especially when the switching station size increases. A filtering process is applied to reduce the number of possible configurations based on multiple protection performance metrics which are evaluated for different power flow scenarios. The results for these performance metrics can be compared for configurations with different numbers of HVDC circuit breakers to assess the benefit of increasing the amount of protection equipment in different network topologies. It is also shown that, through continued filtering using additional performance metrics or fault scenarios, the number of possible breaker, cable and converter configurations can be further reduced, leading to a protection design that is well suited for many operational scenarios. The results of the shortlisting process provide insights on the required number of HVDC circuit breakers to limit fault impacts to a given value. Moreover, observed trends in the results could, in future studies, contribute to new design principles and priorities, allowing system developers to more effectively design HVDC protection systems for different operational scenarios and possible investment levels.
Integrating Building Thermal Flexibility Into Distribution System: A Privacy-Preserved Dispatch Approach
The inherent thermal storage capacity of buildings brings considerable thermal flexibility to the heating/cooling loads, which are promising demand response resources for power systems. It is widely believed that integrating the thermal flexibility of buildings into the distribution system can improve the operating economy and reliability of the system. However, the private information of the buildings needs to be transferred to the distribution system operator (DSO) to achieve a coordinated optimization, bringing serious privacy concerns to users. Given this issue, we propose a novel privacy-preserved optimal dispatch approach for the distribution system incorporating buildings. Using it, the DSO can exploit the thermal flexibility of buildings without accessing their private information, such as model parameters and indoor temperature profiles. Specifically, we first develop an optimal dispatch model for the distribution system integrating buildings, which can be extended to other storage-like flexibility resources. Second, we reveal that the privacy-preserved integration of buildings is a joint privacy preservation problem for both parameters and state variables and then design a privacy-preserved algorithm based on transformation-based encryption, constraint relaxation, and constraint extension techniques. Besides, we implement a detailed privacy analysis for the proposed method, considering both semi-honest adversaries and external eavesdroppers. Case studies demonstrate the accuracy, privacy-preserved performance, and computational efficiency of the proposed method.
comment: Accepted for publication in IEEE Transactions on Industrial Informatics
Versatile Distributed Maneuvering with Generalized Formations using Guiding Vector Fields
This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding singularity-free guiding vector field (GVF), which, along with a distributed coordination mechanism based on the consensus theory, guides robots to achieve various motions (i.e., versatile maneuvering), including (a) formation tracking, (b) target enclosing, and (c) circumnavigation. Additional motion parameters can generate more complex cooperative robot motions. Based on GVFs, we design a controller for a nonholonomic robot model. Besides the theoretical results, extensive simulations and experiments are performed to validate the effectiveness of the approach.
Automatic Basis Function Selection in Iterative Learning Control: A Sparsity-Promoting Approach Applied to an Industrial Printer
Iterative learning control (ILC) techniques are capable of improving the tracking performance of control systems that repeatedly perform similar tasks by utilizing data from past iterations. The aim of this paper is to design a systematic approach for learning parameterized feedforward signals with limited complexity. The developed method involves an iterative learning control in conjunction with a data-driven sparse subset selection procedure for basis function selection. The ILC algorithm that employs sparse optimization is able to automatically select relevant basis functions and is validated on an industrial flatbed printer.
Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency
Heating, Ventilation, and Air Conditioning (HVAC) systems account for approximately 38% of building energy consumption globally, making them one of the most energy-intensive services. The increasing emphasis on energy efficiency and sustainability, combined with the need for enhanced occupant comfort, presents a significant challenge for traditional HVAC systems. These systems often fail to dynamically adjust to real-time changes in electricity market rates or individual comfort preferences, leading to increased energy costs and reduced comfort. In response, we propose a Human-in-the-Loop (HITL) Artificial Intelligence framework that optimizes HVAC performance by incorporating real-time user feedback and responding to fluctuating electricity prices. Unlike conventional systems that require predefined information about occupancy or comfort levels, our approach learns and adapts based on ongoing user input. By integrating the occupancy prediction model with reinforcement learning, the system improves operational efficiency and reduces energy costs in line with electricity market dynamics, thereby contributing to demand response initiatives. Through simulations, we demonstrate that our method achieves significant cost reductions compared to baseline approaches while maintaining or enhancing occupant comfort. This feedback-driven approach ensures personalized comfort control without the need for predefined settings, offering a scalable solution that balances individual preferences with economic and environmental goals.
comment: ACM e-Energy 2025
Formation Maneuver Control Based on the Augmented Laplacian Method
This paper proposes a novel formation maneuver control method for both 2-D and 3-D space, which enables the formation to translate, scale, and rotate with arbitrary orientation. The core innovation is the novel design of weights in the proposed augmented Laplacian matrix. Instead of using scalars, we represent weights as matrices, which are designed based on a specified rotation axis and allow the formation to perform rotation in 3-D space. To further improve the flexibility and scalability of the formation, the rotational axis adjustment approach and dynamic agent reconfiguration method are developed, allowing formations to rotate around arbitrary axes in 3-D space and new agents to join the formation. Theoretical analysis is provided to show that the proposed approach preserves the original configuration of the formation. The proposed method maintains the advantages of the complex Laplacian-based method, including reduced neighbor requirements and no reliance on generic or convex nominal configurations, while achieving arbitrary orientation rotations via a more simplified implementation. Simulations in both 2-D and 3-D space validate the effectiveness of the proposed method.
DeepSync: A Learning Framework for Pervasive Localization using Code Synchronization on Compressed Cellular Spectrum
Pervasive localization is essential for continuous tracking applications, yet existing solutions face challenges in balancing power consumption and accuracy. GPS, while precise, is impractical for continuous tracking of micro-assets due to high power requirements. Recent advances in non-linear compressed spectrum sensing offer low-power alternatives, but existing implementations achieve only coarse positioning through Received Signal Strength Indicator (RSSI) measurements. We present DeepSync, a deep learning framework that enables precise localization using compressed cellular spectrum. Our key technical insight lies in formulating sub-sample timing estimation as a template matching problem, solved through a novel architecture combining temporal CNN encoders for multi-frame processing with cross-attention mechanisms. The system processes non-linear inter-modulated spectrum through hierarchical feature extraction, achieving robust performance at SNR levels below -10dB -- a regime where conventional timing estimation fails. By integrating real cellular infrastructure data with physics-based ray-tracing simulations, DeepSync achieves 2.128-meter median accuracy while consuming significantly less power than conventional systems. Real-world evaluations demonstrate 10x improvement over existing compressed spectrum approaches, establishing a new paradigm for ultra-low-power localization.
A Feedback Control Framework for Incentivised Suburban Parking Utilisation and Urban Core Traffic Relief
Urban traffic congestion, exacerbated by inefficient parking management and cruising for parking, significantly hampers mobility and sustainability in smart cities. Drivers often face delays searching for parking spaces, influenced by factors such as accessibility, cost, distance, and available services such as charging facilities in the case of electric vehicles. These inefficiencies contribute to increased urban congestion, fuel consumption, and environmental impact. Addressing these challenges, this paper proposes a feedback control incentivisation-based system that aims to better distribute vehicles between city and suburban parking facilities offering park-and-charge/-ride services. Individual driver behaviours are captured via discrete choice models incorporating factors of importance to parking location choice among drivers, such as distance to work, public transport connectivity, charging infrastructure availability, and amount of incentive offered; and are regulated through principles of ergodic control theory. The proposed framework is applied to an electric vehicle park-and-charge/-ride problem, and demonstrates how predictable long-term behaviour of the system can be guaranteed.
Connected C-Core Hybrid SRMs for EV Applications
This paper proposes a new class of permanent magnet-assisted three-phase switched reluctance motors (PM-SRMs) designed to achieve significantly higher torque density for electric vehicle (EV) propulsion systems. Eight distinct motor topologies are systematically investigated, including a non-PM baseline design, three innovative PM arrangement strategies, and two optimized rotor/stator teeth configurations (22-pole and 26-pole variants). The study presents analytical models including magnetic equivalent circuits (MECs), detailed operating principles, and generalized design formulations that account for both electromagnetic and structural considerations. A key contribution is the introduction of the point-of-conversion (PoC) concept, which optimizes PM placement by minimizing magnetic path reluctance. Comparative analysis demonstrates torque density improvements over conventional SRMs and existing PM-assisted designs while maintaining structural robustness. Experimental validation confirms that the proposed 24/22 configuration with inter-phase PMs delivers higher torque per PM volume compared to state-of-the-art designs. The findings provide insights for EV motor designers seeking to balance performance, cost, and reliability.
Robust Management of Airport Security Queues Considering Passenger Non-compliance with Chance-Constrained Optimization
The long waiting time at airport security has become an emergent issue as demand for air travel continues to grow. Not only does queuing at security cause passengers to miss their flights, but also reduce the amount of time passengers spend at the airport post-security, potentially leading to less revenue for the airport operator. One of the key issues to address to reduce waiting time is the management of arrival priority. As passengers on later flights can arrive before passengers on earlier flights, the security system does not always process passengers in the order of the degree of urgency. In this paper, we propose a chance-constrained optimization model that decides in which time slot passengers should be recommended to arrive. We use chance constraints to obtain solutions that take the uncertainty in passenger non-compliance into account. The experimental results, based on a sample day of flight schedules at the Barcelona airport, show a reduction of 85% in the total waiting time. Compared to the deterministic case, in which passengers are assumed to fully comply with the recommendations, we see a 30% increase in the reduction of the total waiting time. This highlights the importance of considering variation in passenger compliance in the management of airport security queues.
Handling Pedestrian Uncertainty in Coordinating Autonomous Vehicles at Signal-Free Intersections
In this paper, we provide a theoretical framework for the coordination of connected and automated vehicles (CAVs) at signal-free intersections, accounting for the unexpected presence of pedestrians. First, we introduce a general vehicle-to-infrastructure communication protocol and a low-level controller that determines the optimal unconstrained trajectories for CAVs, in terms of fuel efficiency and travel time, to cross the intersection without considering pedestrians. If such an unconstrained trajectory is unattainable, we introduce sufficient certificates for each CAV to cross the intersection while respecting the associated constraints. Next, we consider the case where an unexpected pedestrian enters the road. When the CAV's sensors detect a pedestrian, an emergency mode is activated, which imposes certificates related to an unsafe set in the pedestrian's proximity area. Simultaneously, a re-planning mechanism is implemented for all CAVs to accommodate the trajectories of vehicles operating in emergency mode. Finally, we validate the efficacy of our approach through simulations conducted in MATLAB and RoadRunner softwares, which facilitate the integration of sensor tools and the realization of real-time implementation.
Autonomous Vision-Based Magnetic Microrobotic Pushing of Micro-Objects and Cells
Accurate and autonomous transportation of micro-objects and biological cells can enable significant advances in a wide variety of research disciplines. Here, we present a novel, vision-based, model-free microrobotic pushing algorithm for the autonomous manipulation of micro objects and biological cells. The algorithm adjusts the axis of a rotating magnetic field that in turn controls the heading angle and spin axis of a spherical Janus rolling microrobot. We introduce the concept of a microrobotic guiding corridor to constrain the object and to avoid pushing failures. We then show that employing only two simple conditions, the microrobot is able to successfully and autonomously push microscale objects along predefined trajectories. We evaluate the performance of the algorithm by measuring the mean absolute error and completion time relative to a desired path at different actuation frequencies and guiding corridor widths. Finally, we demonstrate biomedical applicability by autonomously transporting a single biological cell, highlighting the methods potential for applications in tissue engineering, drug delivery and synthetic biology.
Development of Reduced Feeder and Load Models Using Practical Topological and Loading Data
Distribution feeder and load model reduction methods are essential for maintaining a good tradeoff between accurate representation of grid behavior and reduced computational complexity in power system studies. An effective algorithm to obtain a reduced order representation of the practical feeders using utility topological and loading data has been presented in this paper. Simulations conducted in this work show that the reduced feeder and load model of a utility feeder, obtained using the proposed method, can accurately capture contactor and motor stalling behaviors for critical events such as fault induced delayed voltage recovery.
Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models
As AI models scale to billions of parameters and operate with increasing autonomy, ensuring their safe, reliable operation demands engineering-grade security and assurance frameworks. This paper presents an enterprise-level, risk-aware, security-by-design approach for large-scale autonomous AI systems, integrating standardized threat metrics, adversarial hardening techniques, and real-time anomaly detection into every phase of the development lifecycle. We detail a unified pipeline - from design-time risk assessments and secure training protocols to continuous monitoring and automated audit logging - that delivers provable guarantees of model behavior under adversarial and operational stress. Case studies in national security, open-source model governance, and industrial automation demonstrate measurable reductions in vulnerability and compliance overhead. Finally, we advocate cross-sector collaboration - uniting engineering teams, standards bodies, and regulatory agencies - to institutionalize these technical safeguards within a resilient, end-to-end assurance ecosystem for the next generation of AI.
Direct Data Driven Control Using Noisy Measurements
This paper presents a novel direct data-driven control framework for solving the linear quadratic regulator (LQR) under disturbances and noisy state measurements. The system dynamics are assumed unknown, and the LQR solution is learned using only a single trajectory of noisy input-output data while bypassing system identification. Our approach guarantees mean-square stability (MSS) and optimal performance by leveraging convex optimization techniques that incorporate noise statistics directly into the controller synthesis. First, we establish a theoretical result showing that the MSS of an uncertain data-driven system implies the MSS of the true closed-loop system. Building on this, we develop a robust stability condition using linear matrix inequalities (LMIs) that yields a stabilizing controller gain from noisy measurements. Finally, we formulate a data-driven LQR problem as a semidefinite program (SDP) that computes an optimal gain, minimizing the steady-state covariance. Extensive simulations on benchmark systems -- including a rotary inverted pendulum and an active suspension system -- demonstrate the superior robustness and accuracy of our method compared to existing data-driven LQR approaches. The proposed framework offers a practical and theoretically grounded solution for controller design in noise-corrupted environments where system identification is infeasible.
comment: Submitted to IEEE-TAC
Reconstructing Brain Causal Dynamics for Subject and Task Fingerprints using fMRI Time-series Data
Purpose: Recently, there has been a revived interest in system neuroscience causation models, driven by their unique capability to unravel complex relationships in multi-scale brain networks. In this paper, we present a novel method that leverages causal dynamics to achieve effective fMRI-based subject and task fingerprinting. Methods: By applying an implicit-explicit discretization scheme, we develop a two-timescale linear state-space model. Through data-driven identification of its parameters, the model captures causal signatures, including directed interactions among brain regions from a spatial perspective, and disentangled fast and slow dynamic modes of brain activity from a temporal perspective. These causal signatures are then integrated with: (i) a modal decomposition and projection method for model-based subject identification, and (ii) a Graph Neural Network (GNN) framework for learning-based task classification. Furthermore, we introduce the concept of the brain reachability landscape as a novel visualization tool, which quantitatively characterizes the maximum possible activation levels of brain regions under various fMRI tasks. Results: We evaluate the proposed approach using the Human Connectome Project dataset and demonstrate its advantage over non-causality-based methods. The obtained causal signatures are visualized and demonstrate clear biological relevance with established understandings of brain function. Conclusion: We verified the feasibility and effectiveness of utilizing brain causal signatures for subject and task fingerprinting. Additionally, our work paves the way for further studies on causal fingerprints with potential applications in both healthy controls and neurodegenerative diseases.
Interval reduced-order switched positive observers for uncertain switched positive linear systems
In this paper, existence conditions and a design procedure of reduced-order switched positive observers for continuous- and discrete-time switched positive linear systems with uncertainty are established. In the analyzed class, arbitrary switching is permitted, whereas the uncertainty expressed via matrix inequalities concerns both the initial state and system parameters. Positive lower and positive upper interval switched observers are obtained. The proposed observers are of (n - p) order, where n is the dimension of the state vector and p is the rank of the output matrix, i.e., p-dimensional measurement information. Moreover, as a special case, existence conditions and a design procedure of reduced-order positive observers for uncertain positive linear systems without switching are provided. The theoretical findings are illustrated by two numerical examples for continuous- and discrete-time systems.
comment: 11 pages, 4 figures
Modelling of a DC-DC Buck Converter Using Long-Short-Term-Memory (LSTM)
Artificial neural networks make it possible to identify black-box models. Based on a recurrent nonlinear autoregressive exogenous neural network, this research provides a technique for simulating the static and dynamic behavior of a DC-DC power converter. This approach employs an algorithm for training a neural network using the inputs and outputs (currents and voltages) of a Buck converter. The technique is validated using simulated data of a realistic Simulink-programmed nonsynchronous Buck converter model and experimental findings. The correctness of the technique is determined by comparing the predicted outputs of the neural network to the actual outputs of the system, thereby confirming the suggested strategy. Simulation findings demonstrate the practicability and precision of the proposed black-box method.
comment: Outdated results
Exact Imposition of Safety Boundary Conditions in Neural Reachable Tubes ICRA 2025
Hamilton-Jacobi (HJ) reachability analysis is a widely adopted verification tool to provide safety and performance guarantees for autonomous systems. However, it involves solving a partial differential equation (PDE) to compute a safety value function, whose computational and memory complexity scales exponentially with the state dimension, making its direct application to large-scale systems intractable. To overcome these challenges, DeepReach, a recently proposed learning-based approach, approximates high-dimensional reachable tubes using neural networks (NNs). While shown to be effective, the accuracy of the learned solution decreases with system complexity. One of the reasons for this degradation is a soft imposition of safety constraints during the learning process, which corresponds to the boundary conditions of the PDE, resulting in inaccurate value functions. In this work, we propose ExactBC, a variant of DeepReach that imposes safety constraints exactly during the learning process by restructuring the overall value function as a weighted sum of the boundary condition and the NN output. Moreover, the proposed variant no longer needs a boundary loss term during the training process, thus eliminating the need to balance different loss terms. We demonstrate the efficacy of the proposed approach in significantly improving the accuracy of the learned value function for four challenging reachability tasks: a rimless wheel system with state resets, collision avoidance in a cluttered environment, autonomous rocket landing, and multi-aircraft collision avoidance.
comment: First two authors have contributed equally. 7 Pages, 3 figures. Accepted at ICRA 2025
GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic Environments
Reinforcement Learning (RL) methods used for solving real-world optimization problems often involve dynamic state-action spaces, larger scale, and sparse rewards, leading to significant challenges in convergence, scalability, and efficient exploration of the solution space. This study introduces GNN-DT, a novel Decision Transformer (DT) architecture that integrates Graph Neural Network (GNN) embedders with a novel residual connection between input and output tokens crucial for handling dynamic environments. By learning from previously collected trajectories, GNN-DT tackles the sparse rewards limitations of online RL algorithms and delivers high-quality solutions in real-time. We evaluate GNN-DT on the complex electric vehicle (EV) charging optimization problem and prove that its performance is superior and requires significantly fewer training trajectories, thus improving sample efficiency compared to existing DT and offline RL baselines. Furthermore, GNN-DT exhibits robust generalization to unseen environments and larger action spaces, addressing a critical gap in prior offline and online RL approaches.
GreenLight-Gym: Reinforcement learning benchmark environment for control of greenhouse production systems
This study presents GreenLight-Gym, a new, fast, open-source benchmark environment for developing reinforcement learning (RL) methods in greenhouse crop production control. Built on the state-of-the-art GreenLight model, it features a differentiable C++ implementation leveraging the CasADi framework for efficient numerical integration. GreenLight-Gym improves simulation speed by a factor of 17 over the original GreenLight implementation. A modular Python environment wrapper enables flexible configuration of control tasks and RL-based controllers. This flexibility is demonstrated by learning controllers under parametric uncertainty using two well-known RL algorithms. GreenLight-Gym provides a standardized benchmark for advancing RL methodologies and evaluating greenhouse control solutions under diverse conditions. The greenhouse control community is encouraged to use and extend this benchmark to accelerate innovation in greenhouse crop production.
comment: This submission replaces our previous pre-print with the version accepted to the 2025 IFAC conference. A new Git repository (https://github.com/BartvLaatum/GreenLight-Gym) accompanies this paper; the repository for the prior version remains live at https://github.com/YourOrg/GreenLightGym. The earlier pre-print is still available on ArXiv under the previous submission number
Parameter Invariance Analysis of Moment Equations Using Dulmage-Mendelsohn Decomposition
Living organisms maintain stable functioning amid environmental fluctuations through homeostasis, a property that preserves a system's behavior despite changes in environmental conditions. To elucidate homeostasis in stochastic biochemical reactions, theoretical tools for assessing population-level invariance under parameter perturbations are crucial. In this paper, we propose a systematic method for identifying the stationary moments that remain invariant under parameter perturbations by leveraging the structural properties of the stationary moment equations. A key step in this development is addressing the underdetermined nature of moment equations, which has traditionally made it difficult to characterize how stationary moments depend on system parameters. To overcome this, we utilize the Dulmage-Mendelsohn (DM) decomposition of the coefficient matrix to extract welldetermined subequations and reveal their hierarchical structure. Leveraging this structure, we identify stationary moments whose partial derivatives with respect to parameters are structurally zero, facilitating the exploration of fundamental constraints that govern homeostatic behavior in stochastic biochemical systems.
A First-Order Gradient Approach for the Connectivity Optimization of Markov Chains
Graphs are commonly used to model various complex systems, including social networks, power grids, transportation networks, and biological systems. In many applications, the connectivity of these networks can be expressed through the Mean First Passage Times (MFPTs) of a Markov chain modeling a random walker on the graph. In this paper, we generalize the network metrics based on Markov chains' MFPTs and extend them to networks affected by uncertainty, in which edges may fail and hence not be present according to a pre-determined stochastic model. To find optimally connected Markov chains, we present a parameterization-free method for optimizing the MFPTs of the Markov chain. More specifically, we present an efficient Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm in the context of Markov chain optimization. The proposed algorithm is suitable for both fixed and random networks. Using various numerical experiments, we demonstrate scalability compared to established benchmarks. Importantly, our algorithm finds an optimal solution without requiring prior knowledge of edge failure probabilities, allowing for an online optimization approach.
comment: 15 pages, 6 figures
Tuning a Cascaded Online Feedback Optimization Controller for Provision of Distributed Flexibility
Coordinating a high number of flexibility providing units (e.g. to provide ancillary services for the transmission system) across various grid layers requires new control concepts. A flexibility request at a point of common coupling can be met by utilizing a cascaded control structure based on online feedback optimization. In this paper the influence of the parameterization of the individual controllers on the performance of the hierarchical flexibility provision is studied on a three-level test system. The results show a high interdependency between the choice of control parameters of one controller and the behavior of other controllers as well as a significant impact on the accuracy and speed of flexibility provision. With a careful tuning, a cascaded structure based on online feedback optimization can achieve efficient vertical coordination of flexibility providing units.
comment: Published in 2024 International Conference on Smart Energy Systems and Technologies (SEST)
Opacity Enforcement by Edit Functions Under Incomparable Observations
As an information-flow privacy property, opacity characterizes whether a malicious external observer (referred to as an intruder) is able to infer the secret behavior of a system. This paper addresses the problem of opacity enforcement using edit functions in discrete event systems modeled by partially observed deterministic finite automata. A defender uses the edit function as an interface at the output of a system to manipulate actual observations through insertion, substitution, and deletion operations so that the intruder will be prevented from inferring the secret behavior of the system. Unlike existing work which usually assumes that the observation capabilities of the intruder and the defender are identical, we consider a more general setting where they may observe incomparable subsets of events generated by the system.To characterize whether the defender has the ability to enforce opacity of the system under this setting, the notion of \emph{$ic$-enforceability} is introduced. Then, the opacity enforcement problem is transformed to a two-player game, with imperfect information between the system and the defender, which can be used to determine a feasible decision-making strategy for the defender. Within the game scheme, an edit mechanism is constructed to enumerate all feasible edit actions following system behavior. We further show that an $ic$-enforcing edit function (if one exists) can be synthesized from the edit mechanism to enforce opacity.
Carbon-Aware Quality Adaptation for Energy-Intensive Services
The energy demand of modern cloud services, particularly those related to generative AI, is increasing at an unprecedented pace. To date, carbon-aware computing strategies have primarily focused on batch process scheduling or geo-distributed load balancing. However, such approaches are not applicable to services that require constant availability at specific locations due to latency, privacy, data, or infrastructure constraints. In this paper, we explore how the carbon footprint of energy-intensive services can be reduced by adjusting the fraction of requests served by different service quality tiers. We show that adapting this quality of responses with respect to grid carbon intensity can lead to additional carbon savings beyond resource and energy efficiency and introduce a forecast-based multi-horizon optimization that reaches close-to-optimal carbon savings.
comment: Published at e-Energy'25
Robotics
DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
We present DSDrive, a streamlined end-to-end paradigm tailored for integrating the reasoning and planning of autonomous vehicles into a unified framework. DSDrive leverages a compact LLM that employs a distillation method to preserve the enhanced reasoning capabilities of a larger-sized vision language model (VLM). To effectively align the reasoning and planning tasks, a waypoint-driven dual-head coordination module is further developed, which synchronizes dataset structures, optimization objectives, and the learning process. By integrating these tasks into a unified framework, DSDrive anchors on the planning results while incorporating detailed reasoning insights, thereby enhancing the interpretability and reliability of the end-to-end pipeline. DSDrive has been thoroughly tested in closed-loop simulations, where it performs on par with benchmark models and even outperforms in many key metrics, all while being more compact in size. Additionally, the computational efficiency of DSDrive (as reflected in its time and memory requirements during inference) has been significantly enhanced. Evidently thus, this work brings promising aspects and underscores the potential of lightweight systems in delivering interpretable and efficient solutions for AD.
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
The rapid adoption of Vision Language Models (VLMs), pre-trained on large image-text and video-text datasets, calls for protecting and informing users about when to trust these systems. This survey reviews studies on trust dynamics in user-VLM interactions, through a multi-disciplinary taxonomy encompassing different cognitive science capabilities, collaboration modes, and agent behaviours. Literature insights and findings from a workshop with prospective VLM users inform preliminary requirements for future VLM trust studies.
CottonSim: Development of an autonomous visual-guided robotic cotton-picking system in the Gazebo
In this study, an autonomous visual-guided robotic cotton-picking system, built on a Clearpath's Husky robot platform and the Cotton-Eye perception system, was developed in the Gazebo robotic simulator. Furthermore, a virtual cotton farm was designed and developed as a Robot Operating System (ROS 1) package to deploy the robotic cotton picker in the Gazebo environment for simulating autonomous field navigation. The navigation was assisted by the map coordinates and an RGB-depth camera, while the ROS navigation algorithm utilized a trained YOLOv8n-seg model for instance segmentation. The model achieved a desired mean Average Precision (mAP) of 85.2%, a recall of 88.9%, and a precision of 93.0% for scene segmentation. The developed ROS navigation packages enabled our robotic cotton-picking system to autonomously navigate through the cotton field using map-based and GPS-based approaches, visually aided by a deep learning-based perception system. The GPS-based navigation approach achieved a 100% completion rate (CR) with a threshold of 5 x 10^-6 degrees, while the map-based navigation approach attained a 96.7% CR with a threshold of 0.25 m. This study establishes a fundamental baseline of simulation for future agricultural robotics and autonomous vehicles in cotton farming and beyond. CottonSim code and data are released to the research community via GitHub: https://github.com/imtheva/CottonSim
comment: 45 pages, 15 figures, 4 tables
Localization and path following for an autonomous e-scooter
In order to mitigate economical, ecological, and societal challenges in electric scooter (e-scooter) sharing systems, we develop an autonomous e-scooter prototype. Our vision is to design a fully autonomous prototype that can find its way to the next parking spot, high-demand area, or charging station. In this work, we propose a path following solution to enable localization and navigation in an urban environment with a provided path to follow. We design a closed-loop architecture that solves the localization and path following problem while allowing the e-scooter to maintain its balance with a previously developed reaction wheel mechanism. Our approach facilitates state and input constraints, e.g., adhering to the path width, while remaining executable on a Raspberry Pi 5. We demonstrate the efficacy of our approach in a real-world experiment on our prototype.
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
We introduce the novel task of Language-Guided Object Placement in Real 3D Scenes. Our model is given a 3D scene's point cloud, a 3D asset, and a textual prompt broadly describing where the 3D asset should be placed. The task here is to find a valid placement for the 3D asset that respects the prompt. Compared with other language-guided localization tasks in 3D scenes such as grounding, this task has specific challenges: it is ambiguous because it has multiple valid solutions, and it requires reasoning about 3D geometric relationships and free space. We inaugurate this task by proposing a new benchmark and evaluation protocol. We also introduce a new dataset for training 3D LLMs on this task, as well as the first method to serve as a non-trivial baseline. We believe that this challenging task and our new benchmark could become part of the suite of benchmarks used to evaluate and compare generalist 3D LLM models.
comment: Tech report. Project page: https://nianticlabs.github.io/placeit3d/
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
Humans naturally exhibit bilateral symmetry in their gross manipulation skills, effortlessly mirroring simple actions between left and right hands. Bimanual robots-which also feature bilateral symmetry-should similarly exploit this property to perform tasks with either hand. Unlike humans, who often favor a dominant hand for fine dexterous skills, robots should ideally execute ambidextrous manipulation with equal proficiency. To this end, we introduce SYMDEX (SYMmetric DEXterity), a reinforcement learning framework for ambidextrous bi-manipulation that leverages the robot's inherent bilateral symmetry as an inductive bias. SYMDEX decomposes complex bimanual manipulation tasks into per-hand subtasks and trains dedicated policies for each. By exploiting bilateral symmetry via equivariant neural networks, experience from one arm is inherently leveraged by the opposite arm. We then distill the subtask policies into a global ambidextrous policy that is independent of the hand-task assignment. We evaluate SYMDEX on six challenging simulated manipulation tasks and demonstrate successful real-world deployment on two of them. Our approach strongly outperforms baselines on complex task in which the left and right hands perform different roles. We further demonstrate SYMDEX's scalability by extending it to a four-arm manipulation setup, where our symmetry-aware policies enable effective multi-arm collaboration and coordination. Our results highlight how structural symmetry as inductive bias in policy learning enhances sample efficiency, robustness, and generalization across diverse dexterous manipulation tasks.
Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving
Human drivers exhibit individual preferences regarding driving style. Adapting autonomous vehicles to these preferences is essential for user trust and satisfaction. However, existing end-to-end driving approaches often rely on predefined driving styles or require continuous user feedback for adaptation, limiting their ability to support dynamic, context-dependent preferences. We propose a novel approach using multi-objective reinforcement learning (MORL) with preference-driven optimization for end-to-end autonomous driving that enables runtime adaptation to driving style preferences. Preferences are encoded as continuous weight vectors to modulate behavior along interpretable style objectives$\unicode{x2013}$including efficiency, comfort, speed, and aggressiveness$\unicode{x2013}$without requiring policy retraining. Our single-policy agent integrates vision-based perception in complex mixed-traffic scenarios and is evaluated in diverse urban environments using the CARLA simulator. Experimental results demonstrate that the agent dynamically adapts its driving behavior according to changing preferences while maintaining performance in terms of collision avoidance and route completion.
Online Velocity Profile Generation and Tracking for Sampling-Based Local Planning Algorithms in Autonomous Racing Environments
This work presents an online velocity planner for autonomous racing that adapts to changing dynamic constraints, such as grip variations from tire temperature changes and rubber accumulation. The method combines a forward-backward solver for online velocity optimization with a novel spatial sampling strategy for local trajectory planning, utilizing a three-dimensional track representation. The computed velocity profile serves as a reference for the local planner, ensuring adaptability to environmental and vehicle dynamics. We demonstrate the approach's robust performance and computational efficiency in racing scenarios and discuss its limitations, including sensitivity to deviations from the predefined racing line and high jerk characteristics of the velocity profile.
comment: 8 Pages, accepted to be published at the the IEEE Intelligent Vehicles Symposium (IV 2025), June 22-25 in Cluj, Romania
X-Driver: Explainable Autonomous Driving with Vision-Language Models
End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a unified multi-modal large language models(MLLMs) framework designed for closed-loop autonomous driving, leveraging Chain-of-Thought(CoT) and autoregressive modeling to enhance perception and decision-making. We validate X-Driver across multiple autonomous driving tasks using public benchmarks in CARLA simulation environment, including Bench2Drive[6]. Our experimental results demonstrate superior closed-loop performance, surpassing the current state-of-the-art(SOTA) while improving the interpretability of driving decisions. These findings underscore the importance of structured reasoning in end-to-end driving and establish X-Driver as a strong baseline for future research in closed-loop autonomous driving.
The City that Never Settles: Simulation-based LiDAR Dataset for Long-Term Place Recognition Under Extreme Structural Changes
Large-scale construction and demolition significantly challenge long-term place recognition (PR) by drastically reshaping urban and suburban environments. Existing datasets predominantly reflect limited or indoor-focused changes, failing to adequately represent extensive outdoor transformations. To bridge this gap, we introduce the City that Never Settles (CNS) dataset, a simulation-based dataset created using the CARLA simulator, capturing major structural changes-such as building construction and demolition-across diverse maps and sequences. Additionally, we propose TCR_sym, a symmetric version of the original TCR metric, enabling consistent measurement of structural changes irrespective of source-target ordering. Quantitative comparisons demonstrate that CNS encompasses more extensive transformations than current real-world benchmarks. Evaluations of state-of-the-art LiDAR-based PR methods on CNS reveal substantial performance degradation, underscoring the need for robust algorithms capable of handling significant environmental changes. Our dataset is available at https://github.com/Hyunho111/CNS_dataset.
Visual Affordances: Enabling Robots to Understand Object Functionality
Human-robot interaction for assistive technologies relies on the prediction of affordances, which are the potential actions a robot can perform on objects. Predicting object affordances from visual perception is formulated differently for tasks such as grasping detection, affordance classification, affordance segmentation, and hand-object interaction synthesis. In this work, we highlight the reproducibility issue in these redefinitions, making comparative benchmarks unfair and unreliable. To address this problem, we propose a unified formulation for visual affordance prediction, provide a comprehensive and systematic review of previous works highlighting strengths and limitations of methods and datasets, and analyse what challenges reproducibility. To favour transparency, we introduce the Affordance Sheet, a document to detail the proposed solution, the datasets, and the validation. As the physical properties of an object influence the interaction with the robot, we present a generic framework that links visual affordance prediction to the physical world. Using the weight of an object as an example for this framework, we discuss how estimating object mass can affect the affordance prediction. Our approach bridges the gap between affordance perception and robot actuation, and accounts for the complete information about objects of interest and how the robot interacts with them to accomplish its task.
comment: 24 pages, 12 figures, 10 tables. Project website at https://apicis.github.io/aff-survey/
CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations
Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations, which fundamentally limits the scale of training data. A promising approach to address this bottleneck is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way. However, we find that existing methods struggle when applied to complex robot tasks requiring fine-grained motions. We design continuous latent action models (CLAM) which incorporate two key ingredients we find necessary for learning to solve complex continuous control tasks from unlabeled observation data: (a) using continuous latent action labels instead of discrete representations, and (b) jointly training an action decoder to ensure that the latent action space can be easily grounded to real actions with relatively few labeled examples. Importantly, the labeled examples can be collected from non-optimal play data, enabling CLAM to learn performant policies without access to any action-labeled expert data. We demonstrate on continuous control benchmarks in DMControl (locomotion) and MetaWorld (manipulation), as well as on a real WidowX robot arm that CLAM significantly outperforms prior state-of-the-art methods, remarkably with a 2-3x improvement in task success rate compared to the best baseline. Videos and code can be found at clamrobot.github.io.
comment: Latent Action Models, Self-supervised Pretraining, Learning from Videos
CPP-DIP: Multi-objective Coverage Path Planning for MAVs in Dispersed and Irregular Plantations
Coverage Path Planning (CPP) is vital in precision agriculture to improve efficiency and resource utilization. In irregular and dispersed plantations, traditional grid-based CPP often causes redundant coverage over non-vegetated areas, leading to waste and pollution. To overcome these limitations, we propose CPP-DIP, a multi-objective CPP framework designed for Micro Air Vehicles (MAVs). The framework transforms the CPP task into a Traveling Salesman Problem (TSP) and optimizes flight paths by minimizing travel distance, turning angles, and intersection counts. Unlike conventional approaches, our method does not rely on GPS-based environmental modeling. Instead, it uses aerial imagery and a Histogram of Oriented Gradients (HOG)-based approach to detect trees and extract image coordinates. A density-aware waypoint strategy is applied: Kernel Density Estimation (KDE) is used to reduce redundant waypoints in dense regions, while a greedy algorithm ensures complete coverage in sparse areas. To verify the generality of the framework, we solve the resulting TSP using three different methods: Greedy Heuristic Insertion (GHI), Ant Colony Optimization (ACO), and Monte Carlo Reinforcement Learning (MCRL). Then an object-based optimization is applied to further refine the resulting path. Additionally, CPP-DIP integrates ForaNav, our insect-inspired navigation method, for accurate tree localization and tracking. The experimental results show that MCRL offers a balanced solution, reducing the travel distance by 16.9 % compared to ACO while maintaining a similar performance to GHI. It also improves path smoothness by reducing turning angles by 28.3 % and 59.9 % relative to ACO and GHI, respectively, and effectively eliminates intersections. These results confirm the robustness and effectiveness of CPP-DIP in different TSP solvers.
A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation
We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The guidance layer generates multiple trajectories in parallel, each representing a distinct strategy for obstacle avoidance or non-passing. The underlying trajectory optimization constrains the joint probability of collision with VRUs under generic uncertainties. To address extraordinary situations ("edge cases") that go beyond the autonomous capabilities - such as construction zones or encounters with emergency responders - the system includes an option for remote human operation, supported by visual and haptic guidance. In simulation, our motion planner outperforms three baseline approaches in terms of safety and efficiency. We also demonstrate the full system in prototype vehicle tests on a closed track, both in autonomous and remotely operated modes.
comment: Intelligent Vehicles Symposium 2025
LVLM-MPC Collaboration for Autonomous Driving: A Safety-Aware and Task-Scalable Control Architecture
This paper proposes a novel Large Vision-Language Model (LVLM) and Model Predictive Control (MPC) integration framework that delivers both task scalability and safety for Autonomous Driving (AD). LVLMs excel at high-level task planning across diverse driving scenarios. However, since these foundation models are not specifically designed for driving and their reasoning is not consistent with the feasibility of low-level motion planning, concerns remain regarding safety and smooth task switching. This paper integrates LVLMs with MPC Builder, which automatically generates MPCs on demand, based on symbolic task commands generated by the LVLM, while ensuring optimality and safety. The generated MPCs can strongly assist the execution or rejection of LVLM-driven task switching by providing feedback on the feasibility of the given tasks and generating task-switching-aware MPCs. Our approach provides a safe, flexible, and adaptable control framework, bridging the gap between cutting-edge foundation models and reliable vehicle operation. We demonstrate the effectiveness of our approach through a simulation experiment, showing that our system can safely and effectively handle highway driving while maintaining the flexibility and adaptability of LVLMs.
comment: 8 pages, 8 figures
Robust Model-Based In-Hand Manipulation with Integrated Real-Time Motion-Contact Planning and Tracking
Robotic dexterous in-hand manipulation, where multiple fingers dynamically make and break contact, represents a step toward human-like dexterity in real-world robotic applications. Unlike learning-based approaches that rely on large-scale training or extensive data collection for each specific task, model-based methods offer an efficient alternative. Their online computing nature allows for ready application to new tasks without extensive retraining. However, due to the complexity of physical contacts, existing model-based methods encounter challenges in efficient online planning and handling modeling errors, which limit their practical applications. To advance the effectiveness and robustness of model-based contact-rich in-hand manipulation, this paper proposes a novel integrated framework that mitigates these limitations. The integration involves two key aspects: 1) integrated real-time planning and tracking achieved by a hierarchical structure; and 2) joint optimization of motions and contacts achieved by integrated motion-contact modeling. Specifically, at the high level, finger motion and contact force references are jointly generated using contact-implicit model predictive control. The high-level module facilitates real-time planning and disturbance recovery. At the low level, these integrated references are concurrently tracked using a hand force-motion model and actual tactile feedback. The low-level module compensates for modeling errors and enhances the robustness of manipulation. Extensive experiments demonstrate that our approach outperforms existing model-based methods in terms of accuracy, robustness, and real-time performance. Our method successfully completes five challenging tasks in real-world environments, even under appreciable external disturbances.
comment: Submitted to the International Journal of Robotics Research (IJRR)
AI and Vision based Autonomous Navigation of Nano-Drones in Partially-Known Environments
The miniaturisation of sensors and processors, the advancements in connected edge intelligence, and the exponential interest in Artificial Intelligence are boosting the affirmation of autonomous nano-size drones in the Internet of Robotic Things ecosystem. However, achieving safe autonomous navigation and high-level tasks such as exploration and surveillance with these tiny platforms is extremely challenging due to their limited resources. This work focuses on enabling the safe and autonomous flight of a pocket-size, 30-gram platform called Crazyflie 2.1 in a partially known environment. We propose a novel AI-aided, vision-based reactive planning method for obstacle avoidance under the ambit of Integrated Sensing, Computing and Communication paradigm. We deal with the constraints of the nano-drone by splitting the navigation task into two parts: a deep learning-based object detector runs on the edge (external hardware) while the planning algorithm is executed onboard. The results show the ability to command the drone at $\sim8$ frames-per-second and a model performance reaching a COCO mean-average-precision of $60.8$. Field experiments demonstrate the feasibility of the solution with the drone flying at a top speed of $1$ m/s while steering away from an obstacle placed in an unknown position and reaching the target destination. The outcome highlights the compatibility of the communication delay and the model performance with the requirements of the real-time navigation task. We provide a feasible alternative to a fully onboard implementation that can be extended to autonomous exploration with nano-drones.
comment: in DCOSS-IoT 2025, Wi-DroIT 2025
An Efficient Method for Accurate Pose Estimation and Error Correction of Cuboidal Objects IROS 2022
The proposed system outlined in this paper is a solution to a use case that requires the autonomous picking of cuboidal objects from an organized or unorganized pile with high precision. This paper presents an efficient method for precise pose estimation of cuboid-shaped objects, which aims to reduce errors in target pose in a time-efficient manner. Typical pose estimation methods like global point cloud registrations are prone to minor pose errors for which local registration algorithms are generally used to improve pose accuracy. However, due to the execution time overhead and uncertainty in the error of the final achieved pose, an alternate, linear time approach is proposed for pose error estimation and correction. This paper presents an overview of the solution followed by a detailed description of individual modules of the proposed algorithm.
comment: Accepted in IEEE/RSJ IROS 2022 Workshop on Mobile Manipulation and Embodied Intelligence (MOMA)
ADD: Physics-Based Motion Imitation with Adversarial Differential Discriminators
Multi-objective optimization problems, which require the simultaneous optimization of multiple terms, are prevalent across numerous applications. Existing multi-objective optimization methods often rely on manually tuned aggregation functions to formulate a joint optimization target. The performance of such hand-tuned methods is heavily dependent on careful weight selection, a time-consuming and laborious process. These limitations also arise in the setting of reinforcement-learning-based motion tracking for physically simulated characters, where intricately crafted reward functions are typically used to achieve high-fidelity results. Such solutions not only require domain expertise and significant manual adjustment, but also limit the applicability of the resulting reward function across diverse skills. To bridge this gap, we present a novel adversarial multi-objective optimization technique that is broadly applicable to a range of multi-objective optimization problems, including motion tracking. The proposed adversarial differential discriminator receives a single positive sample, yet is still effective at guiding the optimization process. We demonstrate that our technique can enable characters to closely replicate a variety of acrobatic and agile behaviors, achieving comparable quality to state-of-the-art motion-tracking methods, without relying on manually tuned reward functions. Results are best visualized through https://youtu.be/rz8BYCE9E2w.
comment: 19 pages, 15 figures
Real-Time Model Predictive Control of Vehicles with Convex-Polygon-Aware Collision Avoidance in Tight Spaces SC
This paper proposes vehicle motion planning methods with obstacle avoidance in tight spaces by incorporating polygonal approximations of both the vehicle and obstacles into a model predictive control (MPC) framework. Representing these shapes is crucial for navigation in tight spaces to ensure accurate collision detection. However, incorporating polygonal approximations leads to disjunctive OR constraints in the MPC formulation, which require a mixed integer programming and cause significant computational cost. To overcome this, we propose two different collision-avoidance constraints that reformulate the disjunctive OR constraints as tractable conjunctive AND constraints: (1) a Support Vector Machine (SVM)-based formulation that recasts collision avoidance as a SVM optimization problem, and (2) a Minimum Signed Distance to Edges (MSDE) formulation that leverages minimum signed-distance metrics. We validate both methods through extensive simulations, including tight-space parking scenarios and varied-shape obstacle courses, as well as hardware experiments on an RC-car platform. Our results demonstrate that the SVM-based approach achieves superior navigation accuracy in constrained environments; the MSDE approach, by contrast, runs in real time with only a modest reduction in collision-avoidance performance.
comment: 8 pages, 10 figures, 3 tables, The IEEE International Conference on Intelligent Transportation Systems (ITSC) November 18-21, 2025-Gold Coast, Australia
CubeDAgger: Improved Robustness of Interactive Imitation Learning without Violation of Dynamic Stability
Interactive imitation learning makes an agent's control policy robust by stepwise supervisions from an expert. The recent algorithms mostly employ expert-agent switching systems to reduce the expert's burden by limitedly selecting the supervision timing. However, the precise selection is difficult and such a switching causes abrupt changes in actions, damaging the dynamic stability. This paper therefore proposes a novel method, so-called CubeDAgger, which improves robustness while reducing dynamic stability violations by making three improvements to a baseline method, EnsembleDAgger. The first improvement adds a regularization to explicitly activate the threshold for deciding the supervision timing. The second transforms the expert-agent switching system to an optimal consensus system of multiple action candidates. Third, autoregressive colored noise to the actions is introduced to make the stochastic exploration consistent over time. These improvements are verified by simulations, showing that the learned policies are sufficiently robust while maintaining dynamic stability during interaction.
comment: 7 pages, 4 figures
SatAOI: Delimitating Area of Interest for Swing-Arm Troweling Robot for Construction
In concrete troweling for building construction, robots can significantly reduce workload and improve automation level. However, as a primary task of coverage path planning (CPP) for troweling, delimitating area of interest (AOI) in complex scenes is still challenging, especially for swing-arm robots with more complex working modes. Thus, this research proposes an algorithm to delimitate AOI for swing-arm troweling robot (SatAOI algorithm). By analyzing characteristics of the robot and obstacle maps, mathematical models and collision principles are established. On this basis, SatAOI algorithm achieves AOI delimitation by global search and collision detection. Experiments on different obstacle maps indicate that AOI can be effectively delimitated in scenes under different complexity, and the algorithm can fully consider the connectivity of obstacle maps. This research serves as a foundation for CPP algorithm and full process simulation of swing-arm troweling robots.
D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation
Learning bimanual manipulation is challenging due to its high dimensionality and tight coordination required between two arms. Eye-in-hand imitation learning, which uses wrist-mounted cameras, simplifies perception by focusing on task-relevant views. However, collecting diverse demonstrations remains costly, motivating the need for scalable data augmentation. While prior work has explored visual augmentation in single-arm settings, extending these approaches to bimanual manipulation requires generating viewpoint-consistent observations across both arms and producing corresponding action labels that are both valid and feasible. In this work, we propose Diffusion for COordinated Dual-arm Data Augmentation (D-CODA), a method for offline data augmentation tailored to eye-in-hand bimanual imitation learning that trains a diffusion model to synthesize novel, viewpoint-consistent wrist-camera images for both arms while simultaneously generating joint-space action labels. It employs constrained optimization to ensure that augmented states involving gripper-to-object contacts adhere to constraints suitable for bimanual coordination. We evaluate D-CODA on 5 simulated and 3 real-world tasks. Our results across 2250 simulation trials and 300 real-world trials demonstrate that it outperforms baselines and ablations, showing its potential for scalable data augmentation in eye-in-hand bimanual manipulation. Our project website is at: https://dcodaaug.github.io/D-CODA/.
Zippy: The smallest power-autonomous bipedal robot
Miniaturizing legged robot platforms is challenging due to hardware limitations that constrain the number, power density, and precision of actuators at that size. By leveraging design principles of quasi-passive walking robots at any scale, stable locomotion and steering can be achieved with simple mechanisms and open-loop control. Here, we present the design and control of "Zippy", the smallest self-contained bipedal walking robot at only 3.6 cm tall. Zippy has rounded feet, a single motor without feedback control, and is capable of turning, skipping, and ascending steps. At its fastest pace, the robot achieves a forward walking speed of 25 cm/s, which is 10 leg lengths per second, the fastest biped robot of any size by that metric. This work explores the design and performance of the robot and compares it to similar dynamic walking robots at larger scales.
Adaptive Stress Testing Black-Box LLM Planners
Large language models (LLMs) have recently demonstrated success in generalizing across decision-making tasks including planning, control and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. We argue that detecting such failures is necessary, especially in safety-critical scenarios. Existing black-box methods often detect hallucinations by identifying inconsistencies across multiple samples. Many of these approaches typically introduce prompt perturbations like randomizing detail order or generating adversarial inputs, with the intuition that a confident model should produce stable outputs. We first perform a manual case study showing that other forms of perturbations (e.g., adding noise, removing sensor details) cause LLMs to hallucinate in a driving environment. We then propose a novel method for efficiently searching the space of prompt perturbations using Adaptive Stress Testing (AST) with Monte-Carlo Tree Search (MCTS). Our AST formulation enables discovery of scenarios and prompts that cause language models to act with high uncertainty. By generating MCTS prompt perturbation trees across diverse scenarios, we show that offline analyses can be used at runtime to automatically generate prompts that influence model uncertainty, and to inform real-time trust assessments of an LLM.
comment: 26 pages, 16 figures, 4 tables
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/continental/pred2plan.
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
Aerial vision-and-language navigation (VLN), requiring drones to interpret natural language instructions and navigate complex urban environments, emerges as a critical embodied AI challenge that bridges human-robot interaction, 3D spatial reasoning, and real-world deployment. Although existing ground VLN agents achieved notable results in indoor and outdoor settings, they struggle in aerial VLN due to the absence of predefined navigation graphs and the exponentially expanding action space in long-horizon exploration. In this work, we propose \textbf{CityNavAgent}, a large language model (LLM)-empowered agent that significantly reduces the navigation complexity for urban aerial VLN. Specifically, we design a hierarchical semantic planning module (HSPM) that decomposes the long-horizon task into sub-goals with different semantic levels. The agent reaches the target progressively by achieving sub-goals with different capacities of the LLM. Additionally, a global memory module storing historical trajectories into a topological graph is developed to simplify navigation for visited targets. Extensive benchmark experiments show that our method achieves state-of-the-art performance with significant improvement. Further experiments demonstrate the effectiveness of different modules of CityNavAgent for aerial VLN in continuous city environments. The code is available at \href{https://github.com/VinceOuti/CityNavAgent}{link}.
Learning to Drive Anywhere with Model-Based Reannotation11
Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.
comment: 19 pages, 11 figures, 8 tables
Flight Validation of Learning-Based Trajectory Optimization for the Astrobee Free-Flyer
Although widely used in commercial and industrial robotics, trajectory optimization has seen limited use in space applications due to its high computational demands. In this work, we present flight results from experiments with the Astrobee free-flying robot on board the International Space Station (ISS), that demonstrate how machine learning can accelerate on-board trajectory optimization while preserving theoretical solver guarantees. To the best of the authors' knowledge, this is the first-ever demonstration of learning-based control on the ISS. Our approach leverages the GuSTO sequential convex programming framework and uses a neural network, trained offline, to map problem parameters to effective initial ``warm-start'' trajectories, paving the way for faster real-time optimization on resource-constrained space platforms.
comment: Submitted to RSS 2025 Workshop on Space Robotics
Barrier Function Overrides For Non-Convex Fixed Wing Flight Control and Self-Driving Cars
Reinforcement Learning (RL) has enabled vast performance improvements for robotics systems. To achieve these results though, the agent often must randomly explore the environment, which for safety critical systems presents a significant challenge. Barrier functions can solve this challenge by enabling an override that approximates the RL control input as closely as possible without violating a safety constraint. Unfortunately, this override can be computationally intractable in cases where the dynamics are not convex in the control input or when time is discrete, as is often the case when training RL systems. We therefore consider these cases, developing novel barrier functions for two non-convex systems (fixed wing aircraft and self-driving cars performing lane merging with adaptive cruise control) in discrete time. Although solving for an online and optimal override is in general intractable when the dynamics are nonconvex in the control input, we investigate approximate solutions, finding that these approximations enable performance commensurate with baseline RL methods with zero safety violations. In particular, even without attempting to solve for the optimal override at all, performance is still competitive with baseline RL performance. We discuss the tradeoffs of the approximate override solutions including performance and computational tractability.
comment: This work has been submitted to the IEEE for possible publication
Guidance for Intra-cardiac Echocardiography Manipulation to Maintain Continuous Therapy Device Tip Visibility
Intra-cardiac Echocardiography (ICE) plays a critical role in Electrophysiology (EP) and Structural Heart Disease (SHD) interventions by providing real-time visualization of intracardiac structures. However, maintaining continuous visibility of the therapy device tip remains a challenge due to frequent adjustments required during manual ICE catheter manipulation. To address this, we propose an AI-driven tracking model that estimates the device tip incident angle and passing point within the ICE imaging plane, ensuring continuous visibility and facilitating robotic ICE catheter control. A key innovation of our approach is the hybrid dataset generation strategy, which combines clinical ICE sequences with synthetic data augmentation to enhance model robustness. We collected ICE images in a water chamber setup, equipping both the ICE catheter and device tip with electromagnetic (EM) sensors to establish precise ground-truth locations. Synthetic sequences were created by overlaying catheter tips onto real ICE images, preserving motion continuity while simulating diverse anatomical scenarios. The final dataset consists of 5,698 ICE-tip image pairs, ensuring comprehensive training coverage. Our model architecture integrates a pretrained ultrasound (US) foundation model, trained on 37.4M echocardiography images, for feature extraction. A transformer-based network processes sequential ICE frames, leveraging historical passing points and incident angles to improve prediction accuracy. Experimental results demonstrate that our method achieves 3.32 degree entry angle error, 12.76 degree rotation angle error. This AI-driven framework lays the foundation for real-time robotic ICE catheter adjustments, minimizing operator workload while ensuring consistent therapy device visibility. Future work will focus on expanding clinical datasets to further enhance model generalization.
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
Efficient path planning in robotics, particularly within large-scale, dynamic environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability in dynamic scenarios hinder real-time deployment on edge devices. We present SmallPlan -- a novel framework leveraging LLMs as teacher models to train lightweight Small Language Models (SLMs) for high-level path planning tasks. In SmallPlan, the SLMs provide optimal action sequences to navigate across scene graphs that compactly represent full-scaled 3D scenes. The SLMs are trained in a simulation-powered, interleaved manner with LLM-guided supervised fine-tuning (SFT) and reinforcement learning (RL). This strategy not only enables SLMs to successfully complete navigation tasks but also makes them aware of important factors like travel distance and number of trials. Through experiments, we demonstrate that the fine-tuned SLMs perform competitively with larger models like GPT-4o on sequential path planning, without suffering from hallucination and overfitting. SmallPlan is resource-efficient, making it well-suited for edge-device deployment and advancing practical autonomous robotics.
comment: Paper is under review
Demonstrating ViSafe: Vision-enabled Safety for High-speed Detect and Avoid
Assured safe-separation is essential for achieving seamless high-density operation of airborne vehicles in a shared airspace. To equip resource-constrained aerial systems with this safety-critical capability, we present ViSafe, a high-speed vision-only airborne collision avoidance system. ViSafe offers a full-stack solution to the Detect and Avoid (DAA) problem by tightly integrating a learning-based edge-AI framework with a custom multi-camera hardware prototype designed under SWaP-C constraints. By leveraging perceptual input-focused control barrier functions (CBF) to design, encode, and enforce safety thresholds, ViSafe can provide provably safe runtime guarantees for self-separation in high-speed aerial operations. We evaluate ViSafe's performance through an extensive test campaign involving both simulated digital twins and real-world flight scenarios. By independently varying agent types, closure rates, interaction geometries, and environmental conditions (e.g., weather and lighting), we demonstrate that ViSafe consistently ensures self-separation across diverse scenarios. In first-of-its-kind real-world high-speed collision avoidance tests with closure rates reaching 144 km/h, ViSafe sets a new benchmark for vision-only autonomous collision avoidance, establishing a new standard for safety in high-speed aerial navigation.
comment: 13 pages, RSS 2025 Demo track, https://theairlab.org/visafe/
Efficient Estimation of Relaxed Model Parameters for Robust UAV Trajectory Optimization
Online trajectory optimization and optimal control methods are crucial for enabling sustainable unmanned aerial vehicle (UAV) services, such as agriculture, environmental monitoring, and transportation, where available actuation and energy are limited. However, optimal controllers are highly sensitive to model mismatch, which can occur due to loaded equipment, packages to be delivered, or pre-existing variability in fundamental structural and thrust-related parameters. To circumvent this problem, optimal controllers can be paired with parameter estimators to improve their trajectory planning performance and perform adaptive control. However, UAV platforms are limited in terms of onboard processing power, oftentimes making nonlinear parameter estimation too computationally expensive to consider. To address these issues, we propose a relaxed, affine-in-parameters multirotor model along with an efficient optimal parameter estimator. We convexify the nominal Moving Horizon Parameter Estimation (MHPE) problem into a linear-quadratic form (LQ-MHPE) via an affine-in-parameter relaxation on the nonlinear dynamics, resulting in fast quadratic programs (QPs) that facilitate adaptive Model Predictve Control (MPC) in real time. We compare this approach to the equivalent nonlinear estimator in Monte Carlo simulations, demonstrating a decrease in average solve time and trajectory optimality cost by 98.2% and 23.9-56.2%, respectively.
comment: 8 pages, 5 figures, to published in IEEE Sustech 2025
Uncertainty Comes for Free: Human-in-the-Loop Policies with Diffusion Models
Human-in-the-loop (HitL) robot deployment has gained significant attention in both academia and industry as a semi-autonomous paradigm that enables human operators to intervene and adjust robot behaviors at deployment time, improving success rates. However, continuous human monitoring and intervention can be highly labor-intensive and impractical when deploying a large number of robots. To address this limitation, we propose a method that allows diffusion policies to actively seek human assistance only when necessary, reducing reliance on constant human oversight. To achieve this, we leverage the generative process of diffusion policies to compute an uncertainty-based metric based on which the autonomous agent can decide to request operator assistance at deployment time, without requiring any operator interaction during training. Additionally, we show that the same method can be used for efficient data collection for fine-tuning diffusion policies in order to improve their autonomous performance. Experimental results from simulated and real-world environments demonstrate that our approach enhances policy performance during deployment for a variety of scenarios.
An End-to-End Framework for Optimizing Foot Trajectory and Force in Dry Adhesion Legged Wall-Climbing Robots
Foot trajectory planning for dry adhesion legged climbing robots presents challenges, as the phases of foot detachment, swing, and adhesion significantly influence the adhesion and detachment forces essential for stable climbing. To tackle this, an end-to-end foot trajectory and force optimization framework (FTFOF) is proposed, which optimizes foot adhesion and detachment forces through trajectory adjustments. This framework accepts general foot trajectory constraints and user-defined parameters as input, ultimately producing an optimal single foot trajectory. It integrates three-segment $C^2$ continuous Bezier curves, tailored to various foot structures, enabling the generation of effective climbing trajectories. A dilate-based GRU predictive model establishes the relationship between foot trajectories and the corresponding foot forces. Multi-objective optimization algorithms, combined with a redundancy hierarchical strategy, identify the most suitable foot trajectory for specific tasks, thereby ensuring optimal performance across detachment force, adhesion force and vibration amplitude. Experimental validation on the quadruped climbing robot MST-M3F showed that, compared to commonly used trajectories in existing legged climbing robots, the proposed framework achieved reductions in maximum detachment force by 28 \%, vibration amplitude by 82 \%, which ensures the stable climbing of dry adhesion legged climbing robots.
Fast Whole-Body Strain Regulation in Continuum Robots
We propose reaching steps towards the real-time strain control of multiphysics, multiscale continuum soft robots. To study this problem fundamentally, we ground ourselves in a model-based control setting enabled by mathematically precise dynamics of a soft robot prototype. Poised to integrate, rather than reject, inherent mechanical nonlinearities for embodied compliance, we first separate the original robot dynamics into separate subdynamics -- aided by a perturbing time-scale separation parameter. Second, we prescribe a set of stabilizing nonlinear backstepping controllers for regulating the resulting subsystems' strain dynamics. Third, we study the interconnected singularly perturbed system by analyzing and establishing its stability. Fourth, our theories are backed up by fast numerical results on a single arm of the Octopus robot arm. We demonstrate strain regulation to equilibrium, in a significantly reduced time, of the whole-body reduced-order dynamics of an infinite degrees-of-freedom soft robot. This paper communicates our thinking within the backdrop of embodied intelligence: it informs our conceptualization, formulation, computational setup, and yields improved control performance for infinite degrees-of-freedom soft robots.
A Machine Learning Approach to Sensor Substitution from Tactile Sensing to Visual Perception for Non-Prehensile Manipulation
Mobile manipulators are increasingly deployed in complex environments, requiring diverse sensors to perceive and interact with their surroundings. However, equipping every robot with every possible sensor is often impractical due to cost and physical constraints. A critical challenge arises when robots with differing sensor capabilities need to collaborate or perform similar tasks. For example, consider a scenario where a mobile manipulator equipped with high-resolution tactile skin is skilled at non-prehensile manipulation tasks like pushing. If this robot needs to be replaced or augmented by a robot lacking such tactile sensing, the learned manipulation policies become inapplicable. This paper addresses the problem of sensor substitution in non-prehensile manipulation. We propose a novel machine learning-based framework that enables a robot with a limited sensor set (e.g., LiDAR or RGB-D) to effectively perform tasks previously reliant on a richer sensor suite (e.g., tactile skin). Our approach learns a mapping between the available sensor data and the information provided by the substituted sensor, effectively synthesizing the missing sensory input. Specifically, we demonstrate the efficacy of our framework by training a model to substitute tactile skin data for the task of non-prehensile pushing using a mobile manipulator. We show that a manipulator equipped only with LiDAR or RGB-D can, after training, achieve comparable and sometimes even better pushing performance to a mobile base utilizing direct tactile feedback.
comment: 10 pages, 6 figures, submitted to IEEE Sensors Journal, for associated video, see https://youtu.be/6yIRcfn2DsY
Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving
End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine trajectory predictions. MomAD comprises two core components: (1) Topological Trajectory Matching (TTM) employs Hausdorff Distance to select the optimal planning query that aligns with prior paths to ensure coherence;(2) Momentum Planning Interactor (MPI) cross-attends the selected planning query with historical queries to expand static and dynamic perception files. This enriched query, in turn, helps regenerate long-horizon trajectory and reduce collision risks. To mitigate noise arising from dynamic environments and detection errors, we introduce robust instance denoising during training, enabling the planning model to focus on critical signals and improve its robustness. We also propose a novel Trajectory Prediction Consistency (TPC) metric to quantitatively assess planning stability. Experiments on the nuScenes dataset demonstrate that MomAD achieves superior long-term consistency (>=3s) compared to SOTA methods. Moreover, evaluations on the curated Turning-nuScenes shows that MomAD reduces the collision rate by 26% and improves TPC by 0.97m (33.45%) over a 6s prediction horizon, while closedloop on Bench2Drive demonstrates an up to 16.3% improvement in success rate.
comment: 16 pages, 8 figures
A highly maneuverable flying squirrel drone with agility-improving foldable wings
Drones, like most airborne aerial vehicles, face inherent disadvantages in achieving agile flight due to their limited thrust capabilities. These physical constraints cannot be fully addressed through advancements in control algorithms alone. Drawing inspiration from the winged flying squirrel, this paper proposes a highly maneuverable drone equipped with agility-enhancing foldable wings. By leveraging collaborative control between the conventional propeller system and the foldable wings-coordinated through the Thrust-Wing Coordination Control (TWCC) framework-the controllable acceleration set is expanded, enabling the generation of abrupt vertical forces that are unachievable with traditional wingless drones. The complex aerodynamics of the foldable wings are modeled using a physics-assisted recurrent neural network (paRNN), which calibrates the angle of attack (AOA) to align with the real aerodynamic behavior of the wings. The additional air resistance generated by appropriately deploying these wings significantly improves the tracking performance of the proposed "flying squirrel" drone. The model is trained on real flight data and incorporates flat-plate aerodynamic principles. Experimental results demonstrate that the proposed flying squirrel drone achieves a 13.1% improvement in tracking performance, as measured by root mean square error (RMSE), compared to a conventional wingless drone. A demonstration video is available on YouTube: https://youtu.be/O8nrip18azY.
comment: Accepted to IEEE Robotics and Automation Letters. Project Page : https://jgkang1210.github.io/fsdrone_ral/ , Video : https://www.youtube.com/watch?v=tckIF3KCJig , Dohyeon Lee and Jun-Gill Kang are co-authors
CloudTrack: Scalable UAV Tracking with Cloud Semantics
Nowadays, unmanned aerial vehicles (UAVs) are commonly used in search and rescue scenarios to gather information in the search area. The automatic identification of the person searched for in aerial footage could increase the autonomy of such systems, reduce the search time, and thus increase the missed person's chances of survival. In this paper, we present a novel approach to perform semantically conditioned open vocabulary object tracking that is specifically designed to cope with the limitations of UAV hardware. Our approach has several advantages. It can run with verbal descriptions of the missing person, e.g., the color of the shirt, it does not require dedicated training to execute the mission and can efficiently track a potentially moving person. Our experimental results demonstrate the versatility and efficacy of our approach.
comment: 7 pages, 3 figures
REHEARSE-3D: A Multi-modal Emulated Rain Dataset for 3D Point Cloud De-raining
Sensor degradation poses a significant challenge in autonomous driving. During heavy rainfall, the interference from raindrops can adversely affect the quality of LiDAR point clouds, resulting in, for instance, inaccurate point measurements. This, in turn, can potentially lead to safety concerns if autonomous driving systems are not weather-aware, i.e., if they are unable to discern such changes. In this study, we release a new, large-scale, multi-modal emulated rain dataset, REHEARSE-3D, to promote research advancements in 3D point cloud de-raining. Distinct from the most relevant competitors, our dataset is unique in several respects. First, it is the largest point-wise annotated dataset, and second, it is the only one with high-resolution LiDAR data (LiDAR-256) enriched with 4D Radar point clouds logged in both daytime and nighttime conditions in a controlled weather environment. Furthermore, REHEARSE-3D involves rain-characteristic information, which is of significant value not only for sensor noise modeling but also for analyzing the impact of weather at a point level. Leveraging REHEARSE-3D, we benchmark raindrop detection and removal in fused LiDAR and 4D Radar point clouds. Our comprehensive study further evaluates the performance of various statistical and deep-learning models. Upon publication, the dataset and benchmark models will be made publicly available at: https://sporsho.github.io/REHEARSE3D.
Symbolic and User-friendly Geometric Algebra Routines (SUGAR) for Computations in Matlab
Geometric algebra (GA) is a mathematical tool for geometric computing, providing a framework that allows a unified and compact approach to geometric relations which in other mathematical systems are typically described using different more complicated elements. This fact has led to an increasing adoption of GA in applied mathematics and engineering problems. However, the scarcity of symbolic implementations of GA and its inherent complexity, requiring a specific mathematical background, make it challenging and less intuitive for engineers to work with. This prevents wider adoption among more applied professionals. To address this challenge, this paper introduces SUGAR (Symbolic and User-friendly Geometric Algebra Routines), an open-source toolbox designed for Matlab and licensed under the MIT License. SUGAR facilitates the translation of GA concepts into Matlab and provides a collection of user-friendly functions tailored for GA computations, including support for symbolic operations. It supports both numeric and symbolic computations in high-dimensional GAs. Specifically tailored for applied mathematics and engineering applications, SUGAR has been meticulously engineered to represent geometric elements and transformations within two and three-dimensional projective and conformal geometric algebras, aligning with established computational methodologies in the literature. Furthermore, SUGAR efficiently handles functions of multivectors, such as exponential, logarithmic, sinusoidal, and cosine functions, enhancing its applicability across various engineering domains, including robotics, control systems, and power electronics. Finally, this work includes four distinct validation examples, demonstrating SUGAR's capabilities across the above-mentioned fields and its practical utility in addressing real-world applied mathematics and engineering problems.
comment: 33 pages, 6 figures, journal paper accepted in ACM TOMS
FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment
Geometrically accurate and semantically expressive map representations have proven invaluable to facilitate robust and safe mobile robot navigation and task planning. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments is still an open problem. In this paper we present FindAnything, an open-world mapping and exploration framework that incorporates vision-language information into dense volumetric submaps. Thanks to the use of vision-language features, FindAnything bridges the gap between pure geometric and open-vocabulary semantic information for a higher level of understanding while allowing to explore any environment without the help of any external source of ground-truth pose information. We represent the environment as a series of volumetric occupancy submaps, resulting in a robust and accurate map representation that deforms upon pose updates when the underlying SLAM system corrects its drift, allowing for a locally consistent representation between submaps. Pixel-wise vision-language features are aggregated from efficient SAM (eSAM)-generated segments, which are in turn integrated into object-centric volumetric submaps, providing a mapping from open-vocabulary queries to 3D geometry that is scalable also in terms of memory usage. The open-vocabulary map representation of FindAnything achieves state-of-the-art semantic accuracy in closed-set evaluations on the Replica dataset. This level of scene understanding allows a robot to explore environments based on objects or areas of interest selected via natural language queries. Our system is the first of its kind to be deployed on resource-constrained devices, such as MAVs, leveraging vision-language information for real-world robotic tasks.
comment: 11 pages, 5 figures
SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning
Despite recent advances in learning-based controllers for legged robots, deployments in human-centric environments remain limited by safety concerns. Most of these approaches use position-based control, where policies output target joint angles that must be processed by a low-level controller (e.g., PD or impedance controllers) to compute joint torques. Although impressive results have been achieved in controlled real-world scenarios, these methods often struggle with compliance and adaptability when encountering environments or disturbances unseen during training, potentially resulting in extreme or unsafe behaviors. Inspired by how animals achieve smooth and adaptive movements by controlling muscle extension and contraction, torque-based policies offer a promising alternative by enabling precise and direct control of the actuators in torque space. In principle, this approach facilitates more effective interactions with the environment, resulting in safer and more adaptable behaviors. However, challenges such as a highly nonlinear state space and inefficient exploration during training have hindered their broader adoption. To address these limitations, we propose SATA, a bio-inspired framework that mimics key biomechanical principles and adaptive learning mechanisms observed in animal locomotion. Our approach effectively addresses the inherent challenges of learning torque-based policies by significantly improving early-stage exploration, leading to high-performance final policies. Remarkably, our method achieves zero-shot sim-to-real transfer. Our experimental results indicate that SATA demonstrates remarkable compliance and safety, even in challenging environments such as soft/slippery terrain or narrow passages, and under significant external disturbances, highlighting its potential for practical deployments in human-centric and safety-critical scenarios.
LUDO: Low-Latency Understanding of Deformable Objects using Point Cloud Occupancy Functions
Accurately determining the shape of objects and the location of their internal structures within deformable objects is crucial for medical tasks that require precise targeting, such as robotic biopsies. We introduce LUDO, a method for accurate low-latency understanding of deformable objects. LUDO reconstructs objects in their deformed state, including their internal structures, from a single-view point cloud observation in under 30 ms using occupancy networks. LUDO provides uncertainty estimates for its predictions. Additionally, it provides explainability by highlighting key features in its input observations. Both uncertainty and explainability are important for safety-critical applications such as surgical interventions. We demonstrate LUDO's abilities for autonomous targeting of internal regions of interest (ROIs) in deformable objects. We evaluate LUDO in real-world robotic experiments, achieving a success rate of 98.9% for puncturing various ROIs inside deformable objects. LUDO demonstrates the potential to interact with deformable objects without the need for deformable registration methods.
An Efficient GPU-based Implementation for Noise Robust Sound Source Localization
Robot audition, encompassing Sound Source Localization (SSL), Sound Source Separation (SSS), and Automatic Speech Recognition (ASR), enables robots and smart devices to acquire auditory capabilities similar to human hearing. Despite their wide applicability, processing multi-channel audio signals from microphone arrays in SSL involves computationally intensive matrix operations, which can hinder efficient deployment on Central Processing Units (CPUs), particularly in embedded systems with limited CPU resources. This paper introduces a GPU-based implementation of SSL for robot audition, utilizing the Generalized Singular Value Decomposition-based Multiple Signal Classification (GSVD-MUSIC), a noise-robust algorithm, within the HARK platform, an open-source software suite. For a 60-channel microphone array, the proposed implementation achieves significant performance improvements. On the Jetson AGX Orin, an embedded device powered by an NVIDIA GPU and ARM Cortex-A78AE v8.2 64-bit CPUs, we observe speedups of 5648.7x for GSVD calculations and 10.7x for the SSL module, while speedups of 4245.1x for GSVD calculation and 17.3x for the entire SSL module on a server configured with an NVIDIA A100 GPU and AMD EPYC 7352 CPUs, making real-time processing feasible for large-scale microphone arrays and providing ample capacity for real-time processing of potential subsequent machine learning or deep learning tasks.
comment: 6 pages, 2 figures
Deployment-friendly Lane-changing Intention Prediction Powered by Brain-inspired Spiking Neural Networks
Accurate and real-time prediction of surrounding vehicles' lane-changing intentions is a critical challenge in deploying safe and efficient autonomous driving systems in open-world scenarios. Existing high-performing methods remain hard to deploy due to their high computational cost, long training times, and excessive memory requirements. Here, we propose an efficient lane-changing intention prediction approach based on brain-inspired Spiking Neural Networks (SNN). By leveraging the event-driven nature of SNN, the proposed approach enables us to encode the vehicle's states in a more efficient manner. Comparison experiments conducted on HighD and NGSIM datasets demonstrate that our method significantly improves training efficiency and reduces deployment costs while maintaining comparable prediction accuracy. Particularly, compared to the baseline, our approach reduces training time by 75% and memory usage by 99.9%. These results validate the efficiency and reliability of our method in lane-changing predictions, highlighting its potential for safe and efficient autonomous driving systems while offering significant advantages in deployment, including reduced training time, lower memory usage, and faster inference.
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding NeurIPS 2024
Complex 3D scene understanding has gained increasing attention, with scene encoding strategies playing a crucial role in this success. However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios. Our evaluation spans seven vision foundation encoders, including image-based, video-based, and 3D foundation models. We evaluate these models in four tasks: Vision-Language Scene Reasoning, Visual Grounding, Segmentation, and Registration, each focusing on different aspects of scene understanding. Our evaluations yield key findings: DINOv2 demonstrates superior performance, video models excel in object-level tasks, diffusion models benefit geometric tasks, and language-pretrained models show unexpected limitations in language-related tasks. These insights challenge some conventional understandings, provide novel perspectives on leveraging visual foundation models, and highlight the need for more flexible encoder selection in future vision-language and scene-understanding tasks. Code: https://github.com/YunzeMan/Lexicon3D
comment: NeurIPS 2024. Project page: https://yunzeman.github.io/lexicon3d Github: https://github.com/YunzeMan/Lexicon3D
DiSPo: Diffusion-SSM based Policy Learning for Coarse-to-Fine Action Discretization
We aim to solve the problem of generating coarse-to-fine skills learning from demonstrations (LfD). To scale precision, traditional LfD approaches often rely on extensive fine-grained demonstrations with external interpolations or dynamics models with limited generalization capabilities. For memory-efficient learning and convenient granularity change, we propose a novel diffusion-SSM based policy (DiSPo) that learns from diverse coarse skills and produces varying control scales of actions by leveraging a state-space model, Mamba. Our evaluations show the adoption of Mamba and the proposed step-scaling method enable DiSPo to outperform in three coarse-to-fine benchmark tests with maximum 81% higher success rate than baselines. In addition, DiSPo improves inference efficiency by generating coarse motions in less critical regions. We finally demonstrate the scalability of actions with simulation and real-world manipulation tasks.
comment: 12 pages, 10 figures
Open-Source, Cost-Aware Kinematically Feasible Planning for Mobile and Surface Robotics
We present Smac Planner, an openly available, search-based planning framework that addresses the critical need for kinematically feasible path planning across diverse robot platforms. Smac Planner provides high-performance implementations of Cost-Aware A*, Hybrid-A*, and State Lattice planners that can be deployed for Ackermann, legged, and other large non-circular robots. Our framework introduces novel "Cost-Aware" variations that significantly improve performance in complex environments common to mobile robotics while maintaining kinematic feasibility constraints. Integrated as the standard planning system within the popular ROS 2 Navigation stack, Nav2, Smac Planner now powers thousands of robots worldwide across academic research, commercial applications, and field deployments.
Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning ICRA
In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.
comment: This manuscript has been accepted by 2025 IEEE International Conference on Robotics & Automation (ICRA)
Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games
Reinforcement Learning (RL)-based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion game (PEG). Pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data, while a low-level RL policy reasons about evasive versus global path-following behavior. The benchmark results across different domains and different observability show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate, which leads to 51.4% increasing of the performance score on average. Additionally, our method improves interpretability, flexibility and efficiency of the learned policy.
comment: This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication
KineSoft: Learning Proprioceptive Manipulation Policies with Soft Robot Hands
Underactuated soft robot hands offer inherent safety and adaptability advantages over rigid systems, but developing dexterous manipulation skills remains challenging. While imitation learning shows promise for complex manipulation tasks, traditional approaches struggle with soft systems due to demonstration collection challenges and ineffective state representations. We present KineSoft, a framework enabling direct kinesthetic teaching of soft robotic hands by leveraging their natural compliance as a skill teaching advantage rather than only as a control challenge. KineSoft makes two key contributions: (1) an internal strain sensing array providing occlusion-free proprioceptive shape estimation, and (2) a shape-based imitation learning framework that uses proprioceptive feedback with a low-level shape-conditioned controller to ground diffusion-based policies. This enables human demonstrators to physically guide the robot while the system learns to associate proprioceptive patterns with successful manipulation strategies. We validate KineSoft through physical experiments, demonstrating superior shape estimation accuracy compared to baseline methods, precise shape-trajectory tracking, and higher task success rates compared to baseline imitation learning approaches.
Multiagent Systems
Empowering Scientific Workflows with Federated Agents
Agentic systems, in which diverse agents cooperate to tackle challenging problems, are exploding in popularity in the AI community. However, the agentic frameworks used to build these systems have not previously enabled use with research cyberinfrastructure. Here we introduce Academy, a modular and extensible middleware designed to deploy autonomous agents across the federated research ecosystem, including HPC systems, experimental facilities, and data repositories. To meet the demands of scientific computing, Academy supports asynchronous execution, heterogeneous resources, high-throughput data flows, and dynamic resource availability. It provides abstractions for expressing stateful agents, managing inter-agent coordination, and integrating computation with experimental control. We present microbenchmark results that demonstrate high performance and scalability in HPC environments. To demonstrate the breadth of applications that can be supported by agentic workflow designs, we also present case studies in materials discovery, decentralized learning, and information extraction in which agents are deployed across diverse HPC systems.
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
In this position paper, we advocate for the development of conversational technology that is inherently designed to support and facilitate argumentative processes. We argue that, at present, large language models (LLMs) are inadequate for this purpose, and we propose an ideal technology design aimed at enhancing argumentative skills. This involves re-framing LLMs as tools to exercise our critical thinking rather than replacing them. We introduce the concept of 'reasonable parrots' that embody the fundamental principles of relevance, responsibility, and freedom, and that interact through argumentative dialogical moves. These principles and moves arise out of millennia of work in argumentation theory and should serve as the starting point for LLM-based technology that incorporates basic principles of argumentation.
Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration ICML 2025
Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents' exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE algorithm. In SMPE, agents enhance their own policy's discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE outperforms state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.
comment: Accepted (Poster) at ICML 2025
USPR: Learning a Unified Solver for Profiled Routing
The Profiled Vehicle Routing Problem (PVRP) extends the classical VRP by incorporating vehicle-client-specific preferences and constraints, reflecting real-world requirements such as zone restrictions and service-level preferences. While recent reinforcement learning (RL) solvers have shown promise, they require retraining for each new profile distribution, suffer from poor representation ability, and struggle to generalize to out-of-distribution instances. In this paper, we address these limitations by introducing USPR (Unified Solver for Profiled Routing), a novel framework that natively handles arbitrary profile types. USPR introduces three key innovations: (i) Profile Embeddings (PE) to encode any combination of profile types; (ii) Multi-Head Profiled Attention (MHPA), an attention mechanism that models rich interactions between vehicles and clients; (iii) Profile-aware Score Reshaping (PSR), which dynamically adjusts decoder logits using profile scores to improve generalization. Empirical results on diverse PVRP benchmarks demonstrate that USPR achieves state-of-the-art results among learning-based methods while offering significant gains in flexibility and computational efficiency. We make our source code publicly available to foster future research at https://github.com/ai4co/uspr.
Multi-agent Embodied AI: Advances and Future Directions
Embodied artificial intelligence (Embodied AI) plays a pivotal role in the application of advanced technologies in the intelligent era, where AI systems are integrated with physical bodies that enable them to perceive, reason, and interact with their environments. Through the use of sensors for input and actuators for action, these systems can learn and adapt based on real-world feedback, allowing them to perform tasks effectively in dynamic and unpredictable environments. As techniques such as deep learning (DL), reinforcement learning (RL), and large language models (LLMs) mature, embodied AI has become a leading field in both academia and industry, with applications spanning robotics, healthcare, transportation, and manufacturing. However, most research has focused on single-agent systems that often assume static, closed environments, whereas real-world embodied AI must navigate far more complex scenarios. In such settings, agents must not only interact with their surroundings but also collaborate with other agents, necessitating sophisticated mechanisms for adaptation, real-time learning, and collaborative problem-solving. Despite increasing interest in multi-agent systems, existing research remains narrow in scope, often relying on simplified models that fail to capture the full complexity of dynamic, open environments for multi-agent embodied AI. Moreover, no comprehensive survey has systematically reviewed the advancements in this area. As embodied AI rapidly evolves, it is crucial to deepen our understanding of multi-agent embodied AI to address the challenges presented by real-world applications. To fill this gap and foster further development in the field, this paper reviews the current state of research, analyzes key contributions, and identifies challenges and future directions, providing insights to guide innovation and progress in this field.
A Reputation System for Large Language Model-based Multi-agent Systems to Avoid the Tragedy of the Commons
The tragedy of the commons, where individual self-interest leads to collectively disastrous outcomes, is a pervasive challenge in human society. Recent studies have demonstrated that similar phenomena can arise in generative multi-agent systems (MASs). To address this challenge, this paper explores the use of reputation systems as a remedy. We propose RepuNet, a dynamic, dual-level reputation framework that models both agent-level reputation dynamics and system-level network evolution. Specifically, driven by direct interactions and indirect gossip, agents form reputations for both themselves and their peers, and decide whether to connect or disconnect other agents for future interactions. Through two distinct scenarios, we show that RepuNet effectively mitigates the 'tragedy of the commons', promoting and sustaining cooperation in generative MASs. Moreover, we find that reputation systems can give rise to rich emergent behaviors in generative MASs, such as the formation of cooperative clusters, the social isolation of exploitative agents, and the preference for sharing positive gossip rather than negative ones.
Foam-Agent: Towards Automated Intelligent CFD Workflows
Computational Fluid Dynamics (CFD) is an essential simulation tool in various engineering disciplines, but it often requires substantial domain expertise and manual configuration, creating barriers to entry. We present Foam-Agent, a multi-agent framework that automates complex OpenFOAM-based CFD simulation workflows from natural language inputs. Our innovation includes (1) a hierarchical multi-index retrieval system with specialized indices for different simulation aspects, (2) a dependency-aware file generation system that provides consistency management across configuration files, and (3) an iterative error correction mechanism that diagnoses and resolves simulation failures without human intervention. Through comprehensive evaluation on the dataset of 110 simulation tasks, Foam-Agent achieves an 83.6% success rate with Claude 3.5 Sonnet, significantly outperforming existing frameworks (55.5% for MetaOpenFOAM and 37.3% for OpenFOAM-GPT). Ablation studies demonstrate the critical contribution of each system component, with the specialized error correction mechanism providing a 36.4% performance improvement. Foam-Agent substantially lowers the CFD expertise threshold while maintaining modeling accuracy, demonstrating the potential of specialized multi-agent systems to democratize access to complex scientific simulation tools. The code is public at https://github.com/csml-rpi/Foam-Agent
A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration
This research introduces an innovative AI-driven multi-agent framework specifically designed for creating immersive audiobooks. Leveraging neural text-to-speech synthesis with FastSpeech 2 and VALL-E for expressive narration and character-specific voices, the framework employs advanced language models to automatically interpret textual narratives and generate realistic spatial audio effects. These sound effects are dynamically synchronized with the storyline through sophisticated temporal integration methods, including Dynamic Time Warping (DTW) and recurrent neural networks (RNNs). Diffusion-based generative models combined with higher-order ambisonics (HOA) and scattering delay networks (SDN) enable highly realistic 3D soundscapes, substantially enhancing listener immersion and narrative realism. This technology significantly advances audiobook applications, providing richer experiences for educational content, storytelling platforms, and accessibility solutions for visually impaired audiences. Future work will address personalization, ethical management of synthesized voices, and integration with multi-sensory platforms.
CCL: Collaborative Curriculum Learning for Sparse-Reward Multi-Agent Reinforcement Learning via Co-evolutionary Task Evolution
Sparse reward environments pose significant challenges in reinforcement learning, especially within multi-agent systems (MAS) where feedback is delayed and shared across agents, leading to suboptimal learning. We propose Collaborative Multi-dimensional Course Learning (CCL), a novel curriculum learning framework that addresses this by (1) refining intermediate tasks for individual agents, (2) using a variational evolutionary algorithm to generate informative subtasks, and (3) co-evolving agents with their environment to enhance training stability. Experiments on five cooperative tasks in the MPE and Hide-and-Seek environments show that CCL outperforms existing methods in sparse reward settings.
The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete
According to Yuval Noah Harari, large-scale human cooperation is driven by shared narratives that encode common beliefs and values. This study explores whether such narratives can similarly nudge LLM agents toward collaboration. We use a finitely repeated public goods game in which LLM agents choose either cooperative or egoistic spending strategies. We prime agents with stories highlighting teamwork to different degrees and test how this influences negotiation outcomes. Our experiments explore four questions:(1) How do narratives influence negotiation behavior? (2) What differs when agents share the same story versus different ones? (3) What happens when the agent numbers grow? (4) Are agents resilient against self-serving negotiators? We find that story-based priming significantly affects negotiation strategies and success rates. Common stories improve collaboration, benefiting each agent. By contrast, priming agents with different stories reverses this effect, and those agents primed toward self-interest prevail. We hypothesize that these results carry implications for multi-agent system design and AI alignment.
comment: 16 pages, 8 figures. Code available at https://github.com/storyagents25/story-agents
Inverse Inference on Cooperative Control of Networked Dynamical Systems
Recent years have witnessed the rapid advancement of understanding the control mechanism of networked dynamical systems (NDSs), which are governed by components such as nodal dynamics and topology. This paper reveals that the critical components in continuous-time state feedback cooperative control of NDSs can be inferred merely from discrete observations. In particular, we advocate a bi-level inference framework to estimate the global closed-loop system and extract the components, respectively. The novelty lies in bridging the gap from discrete observations to the continuous-time model and effectively decoupling the concerned components. Specifically, in the first level, we design a causality-based estimator for the discrete-time closed-loop system matrix, which can achieve asymptotically unbiased performance when the NDS is stable. In the second level, we introduce a matrix logarithm based method to recover the continuous-time counterpart matrix, providing new sampling period guarantees and establishing the recovery error bound. By utilizing graph properties of the NDS, we develop least square based procedures to decouple the concerned components with up to a scalar ambiguity. Furthermore, we employ inverse optimal control techniques to reconstruct the objective function driving the control process, deriving necessary conditions for the solutions. Numerical simulations demonstrate the effectiveness of the proposed method.
comment: 14 pages
When the Universe is Too Big: Bounding Consideration Probabilities for Plackett-Luce Rankings AISTATS '25
The widely used Plackett-Luce ranking model assumes that individuals rank items by making repeated choices from a universe of items. But in many cases the universe is too big for people to plausibly consider all options. In the choice literature, this issue has been addressed by supposing that individuals first sample a small consideration set and then choose among the considered items. However, inferring unobserved consideration sets (or item consideration probabilities) in this "consider then choose" setting poses significant challenges, because even simple models of consideration with strong independence assumptions are not identifiable, even if item utilities are known. We apply the consider-then-choose framework to top-$k$ rankings, where we assume rankings are constructed according to a Plackett-Luce model after sampling a consideration set. While item consideration probabilities remain non-identified in this setting, we prove that we can infer bounds on the relative values of consideration probabilities. Additionally, given a condition on the expected consideration set size and known item utilities, we derive absolute upper and lower bounds on item consideration probabilities. We also provide algorithms to tighten those bounds on consideration probabilities by propagating inferred constraints. Thus, we show that we can learn useful information about consideration probabilities despite not being able to identify them precisely. We demonstrate our methods on a ranking dataset from a psychology experiment with two different ranking tasks (one with fixed consideration sets and one with unknown consideration sets). This combination of data allows us to estimate utilities and then learn about unknown consideration probabilities using our bounds.
comment: 25 pages; accepted to AISTATS '25; early version was an extended abstract at AAMAS '24
Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies
Cities are not monolithic; they are arenas of negotiation among groups that hold varying needs, values, and experiences. Conventional methods of urban assessment -- from standardized surveys to AI-driven evaluations -- frequently rely on a single consensus metric (e.g., an average measure of inclusivity or safety). Although such aggregations simplify design decisions, they risk obscuring the distinct perspectives of marginalized populations. In this paper, we present findings from a community-centered study in Montreal involving 35 residents with diverse demographic and social identities, particularly wheelchair users, seniors, and LGBTQIA2+ individuals. Using rating and ranking tasks on 20 urban sites, we observe that disagreements are systematic rather than random, reflecting structural inequalities, differing cultural values, and personal experiences of safety and accessibility. Based on these empirical insights, we propose negotiative alignment, an AI framework that treats disagreement as an essential input to be preserved, analyzed, and addressed. Negotiative alignment builds on pluralistic models by dynamically updating stakeholder preferences through multi-agent negotiation mechanisms, ensuring no single perspective is marginalized. We outline how this framework can be integrated into urban analytics -- and other decision-making contexts -- to retain minority viewpoints, adapt to changing stakeholder concerns, and enhance fairness and accountability. The study demonstrates that preserving and engaging with disagreement, rather than striving for an artificial consensus, can produce more equitable and responsive AI-driven outcomes in urban design.
comment: 16 pages, 13 figures
Robust Coordination under Misaligned Communication via Power Regularization
Effective communication in Multi-Agent Reinforcement Learning (MARL) can significantly enhance coordination and collaborative performance in complex and partially observable environments. However, reliance on communication can also introduce vulnerabilities when agents are misaligned, potentially leading to adversarial interactions that exploit implicit assumptions of cooperative intent. Prior work has addressed adversarial behavior through power regularization through controlling the influence one agent exerts over another, but has largely overlooked the role of communication in these dynamics. This paper introduces Communicative Power Regularization (CPR), extending power regularization specifically to communication channels. By explicitly quantifying and constraining agents' communicative influence during training, CPR actively mitigates vulnerabilities arising from misaligned or adversarial communications. Evaluations across benchmark environments Red-Door-Blue-Door, Predator-Prey, and Grid Coverage demonstrate that our approach significantly enhances robustness to adversarial communication while preserving cooperative performance, offering a practical framework for secure and resilient cooperative MARL systems.
Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games
Reinforcement Learning (RL)-based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion game (PEG). Pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data, while a low-level RL policy reasons about evasive versus global path-following behavior. The benchmark results across different domains and different observability show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate, which leads to 51.4% increasing of the performance score on average. Additionally, our method improves interpretability, flexibility and efficiency of the learned policy.
comment: This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication
Systems and Control (CS)
Approximation-free Control for Signal Temporal Logic Specifications using Spatiotemporal Tubes
This paper presents a spatiotemporal tube (STT)-based control framework for satisfying Signal Temporal Logic (STL) specifications in unknown control-affine systems. We formulate STL constraints as a robust optimization problem (ROP) and recast it as a scenario optimization program (SOP) to construct STTs with formal correctness guarantees. We also propose a closed-form control law that operates independently of the system dynamics, and ensures the system trajectory evolves within the STTs, thereby satisfying the STL specifications. The proposed approach is validated through case studies and comparisons with state-of-the-art methods, demonstrating superior computational efficiency, trajectory quality, and applicability to complex STL tasks.
Localization and path following for an autonomous e-scooter
In order to mitigate economical, ecological, and societal challenges in electric scooter (e-scooter) sharing systems, we develop an autonomous e-scooter prototype. Our vision is to design a fully autonomous prototype that can find its way to the next parking spot, high-demand area, or charging station. In this work, we propose a path following solution to enable localization and navigation in an urban environment with a provided path to follow. We design a closed-loop architecture that solves the localization and path following problem while allowing the e-scooter to maintain its balance with a previously developed reaction wheel mechanism. Our approach facilitates state and input constraints, e.g., adhering to the path width, while remaining executable on a Raspberry Pi 5. We demonstrate the efficacy of our approach in a real-world experiment on our prototype.
Optimal Microgrid Sizing of Offshore Renewable Energy Sources for Offshore Platforms and Coastal Communities
The global energy landscape is undergoing a transformative shift towards renewable energy and advanced storage solutions, driven by the urgent need for sustainable and resilient power systems. Isolated offshore communities, such as islands and offshore platforms, which traditionally rely on mainland grids or diesel generators, stand to gain significantly from renewable energy integration. Promising offshore renewable technologies include wind turbines, wave and tidal energy converters, and floating photovoltaic systems, paired with a storage solution like battery energy storage systems. This paper introduces a renewable energy microgrid optimizer (REMO), a tool designed to identify the optimal sizes of renewable generation and storage resources for offshore microgrids. A key challenge in such models is accurately accounting for battery degradation costs. To address this, the REMO model integrates a deep neural network-based battery degradation (DNN-BD) module, which factors in variables like ambient temperature, charge/discharge rates, state of charge, depth of discharge and battery health. Simulations on six test regions demonstrate that the REMO-DNN-BD approach minimizes lifetime energy costs while maintaining high reliability and sustainability, making it a viable design solution for offshore microgrid systems.
CV-MP: Max-Pressure Control in Heterogeneously Distributed and Partially Connected Vehicle Environments
Max-pressure (MP) control has emerged as a prominent real-time network traffic signal control strategy due to its simplicity, decentralized structure, and theoretical guarantees of network queue stability. Meanwhile, advances in connected vehicle (CV) technology have sparked extensive research into CV-based traffic signal control. Despite these developments, few studies have investigated MP control in heterogeneously distributed and partially CV environments while ensuring network queue stability. To address these research gaps, we propose a CV-based MP control (CV-MP) method that leverages real-time CV travel time information to compute the pressure, thereby incorporating both the spatial distribution and temporal delays of vehicles, unlike existing approaches that utilized only spatial distribution or temporal delays. In particular, we establish sufficient conditions for road network queue stability that are compatible with most existing MP control methods. Moreover, we pioneered the proof of network queue stability even if the vehicles are only partially connected and heterogeneously distributed, and gave a necessary condition of CV observation for maintaining the stability. Evaluation results on an Amsterdam corridor show that CV-MP significantly reduces vehicle delays compared to both actuated control and conventional MP control across various CV penetration rates. Moreover, in scenarios with dynamic traffic demand, CV-MP achieves lower spillover peaks even with low and heterogeneous CV penetration rates, further highlighting its effectiveness and robustness.
High Altitude Platform-Based Caching and Multicasting for Rural Connectivity
Providing efficient and reliable content delivery in rural areas remains a significant challenge due to the lack of communication infrastructure. To bridge the digital divide, this paper investigates the potential of leveraging multiple high-altitude platforms (HAPs) for energy-efficient content delivery in wide rural regions. Each caching-enabled HAP is equipped with both Free-Space Optical (FSO) transceivers for backhaul links and Radio Frequency (RF) antenna arrays for access links. To further enhance network efficiency, we consider a network coding-based multicasting scheme, where different types of content are treated as distinct multicast sessions. With the objective of minimizing long-term power cost, we propose a hierarchical framework that integrates deep reinforcement learn-ing (DRL) and convex optimization to jointly optimize dynamic caching strategies and resource allocation across the network. Simulation results demonstrate that our approach significantly reduces power cost compared to several baseline approaches, providing a practical solution for improving rural connectivity.
comment: 13 pages, 8 figures, submitted to IEEE journals for possible publication
Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network
Federated Learning (FL) has revolutionized collaborative model training in distributed networks, prioritizing data privacy and communication efficiency. This paper investigates efficient deployment of FL in wireless heterogeneous networks, focusing on strategies to accelerate convergence despite stragglers. The primary objective is to minimize long-term convergence wall-clock time through optimized user scheduling and resource allocation. While stragglers may introduce delays in a single round, their inclusion can expedite subsequent rounds, particularly when they possess critical information. Moreover, balancing single-round duration with the number of cumulative rounds, compounded by dynamic training and transmission conditions, necessitates a novel approach beyond conventional optimization solutions. To tackle these challenges, convergence analysis with respect to adaptive and biased scheduling is derived. Then, by factoring in real-time system and statistical information, including diverse energy constraints and users' energy harvesting capabilities, a deep reinforcement learning approach, empowered by proximal policy optimization, is employed to adaptively select user sets. For the scheduled users, Lagrangian decomposition is applied to optimize local resource utilization, further enhancing system efficiency. Simulation results validate the effectiveness and robustness of the proposed framework for various FL tasks, demonstrating reduced task time compared to existing benchmarks under various settings.
comment: 13 pages, 9 figures
LAPSO: A Unified Optimization View for Learning-Augmented Power System Operations
With the high penetration of renewables, traditional model-based power system operation is challenged to deliver economic, stable, and robust decisions. Machine learning has emerged as a powerful modeling tool for capturing complex dynamics to address these challenges. However, its separate design often lacks systematic integration with existing methods. To fill the gap, this paper proposes a holistic framework of Learning-Augmented Power System Operations (LAPSO, pronounced as Lap-So). Adopting a native optimization perspective, LAPSO is centered on the operation stage and aims to break the boundary between temporally siloed power system tasks, such as forecast, operation and control, while unifying the objectives of machine learning and model-based optimizations at both training and inference stages. Systematic analysis and simulations demonstrate the effectiveness of applying LAPSO in designing new integrated algorithms, such as stability-constrained optimization (SCO) and objective-based forecasting (OBF), while enabling end-to-end tracing of different sources of uncertainties. In addition, a dedicated Python package-lapso is introduced to automatically augment existing power system optimization models with learnable components. All code and data are available at https://github.com/xuwkk/lapso_exp.
Day-Ahead Bidding Strategies for Wind Farm Operators under a One-Price Balancing Scheme
We study day-ahead bidding strategies for wind farm operators under a one-price balancing scheme, prevalent in European electricity markets. In this setting, the profit-maximising strategy becomes an all-or-nothing strategy, aiming to take advantage of open positions in the balancing market. However, balancing prices are difficult, if not impossible, to forecast in the day-ahead stage and large open positions can affect the balancing price by changing the direction of the system imbalance. This paper addresses day-ahead bidding as a decision-making problem under uncertainty, with the objective of maximising the expected profit while reducing the imbalance risk related to the strategy. To this end, we develop a stochastic optimisation problem with explicit constraints on the positions in the balancing market, providing risk certificates, and derive an analytical solution to this problem. Moreover, we show how the price-impact of the trading strategy on the balancing market can be included in the ex-post evaluation. Using real data from the Belgian electricity market and an offshore wind farm in the North Sea, we demonstrate that the all-or-nothing strategy negatively impacts the balancing price, resulting in long-term losses for the wind farm. Our risk-constrained strategy, however, can still significantly enhance operational profit compared to traditional point-forecast bidding.
Predictive Control of EV Overnight Charging with Multi-Session Flexibility
The majority of electric vehicles (EVs) are charged domestically overnight, where the precise timing of power allocation is not important to the user, thus representing a source of flexibility that can be leveraged by charging control algorithms. In this paper, we relax the common assumption, that EVs require full charge every morning, enabling additional flexibility to defer charging of surplus energy to subsequent nights, which can enhance the performance of controlled charging. In particular, we consider a simple domestic smart plug, scheduling power delivery with the objective to minimize CO$_2$ emissions over prediction horizons of multiple sessions -- up to seven days ahead -- utilising model predictive control (MPC). Based on carbon intensity data from the UK National Grid, we demonstrate significant potential for emission reductions with multi-session planning of 40 to 46\% compared to uncontrolled charging and 19 to 26\% compared to single-session planning. Furthermore, we assess, how the driving and charging behaviour of EV users affects the available flexibility and consequentially the potential for emission reductions. Finally, using grid carbon intensity data from 14 different UK regions, we report significant variations in absolute emission reductions based on the local energy mix.
Enhanced Robust Tracking Control: An Online Learning Approach
This work focuses the tracking control problem for nonlinear systems subjected to unknown external disturbances. Inspired by contraction theory, a neural network-dirven CCM synthesis is adopted to obtain a feedback controller that could track any feasible trajectory. Based on the observation that the system states under continuous control input inherently contain embedded information about unknown external disturbances, we propose an online learning scheme that captures the disturbances dyanmics from online historical data and embeds the compensation within the CCM controller. The proposed scheme operates as a plug-and-play module that intrinsically enhances the tracking performance of CCM synthesis. The numerical simulations on tethered space robot and PVTOL demonstrate the effectiveness of proposed scheme. The source code of the proposed online learning scheme can be found at https://github.com/NPU-RCIR/Online_CCM.git.
Semi-Explicit Solution of Some Discrete-Time Mean-Field-Type Games with Higher-Order Costs
Traditional solvable game theory and mean-field-type game theory (risk-aware games) predominantly focus on quadratic costs due to their analytical tractability. Nevertheless, they often fail to capture critical non-linearities inherent in real-world systems. In this work, we present a unified framework for solving discrete-time game problems with higher-order state and strategy costs involving power-law terms. We derive semi-explicit expressions for equilibrium strategies, cost-to-go functions, and recursive coefficient dynamics across deterministic, stochastic, and multi-agent system settings by convex-completion techniques. The contributions include variance-aware solutions under additive and multiplicative noise, extensions to mean-field-type-dependent dynamics, and conditions that ensure the positivity of recursive coefficients. Our results provide a foundational methodology for analyzing non linear multi-agent systems under higher-order penalization, bridging classical game theory and mean-field-type game theory with modern computational tools for engineering applications.
comment: arXiv admin note: text overlap with arXiv:2505.04112
A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation
We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The guidance layer generates multiple trajectories in parallel, each representing a distinct strategy for obstacle avoidance or non-passing. The underlying trajectory optimization constrains the joint probability of collision with VRUs under generic uncertainties. To address extraordinary situations ("edge cases") that go beyond the autonomous capabilities - such as construction zones or encounters with emergency responders - the system includes an option for remote human operation, supported by visual and haptic guidance. In simulation, our motion planner outperforms three baseline approaches in terms of safety and efficiency. We also demonstrate the full system in prototype vehicle tests on a closed track, both in autonomous and remotely operated modes.
comment: Intelligent Vehicles Symposium 2025
LVLM-MPC Collaboration for Autonomous Driving: A Safety-Aware and Task-Scalable Control Architecture
This paper proposes a novel Large Vision-Language Model (LVLM) and Model Predictive Control (MPC) integration framework that delivers both task scalability and safety for Autonomous Driving (AD). LVLMs excel at high-level task planning across diverse driving scenarios. However, since these foundation models are not specifically designed for driving and their reasoning is not consistent with the feasibility of low-level motion planning, concerns remain regarding safety and smooth task switching. This paper integrates LVLMs with MPC Builder, which automatically generates MPCs on demand, based on symbolic task commands generated by the LVLM, while ensuring optimality and safety. The generated MPCs can strongly assist the execution or rejection of LVLM-driven task switching by providing feedback on the feasibility of the given tasks and generating task-switching-aware MPCs. Our approach provides a safe, flexible, and adaptable control framework, bridging the gap between cutting-edge foundation models and reliable vehicle operation. We demonstrate the effectiveness of our approach through a simulation experiment, showing that our system can safely and effectively handle highway driving while maintaining the flexibility and adaptability of LVLMs.
comment: 8 pages, 8 figures
Learning Linearized Models from Nonlinear Systems under Initialization Constraints with Finite Data
The identification of a linear system model from data has wide applications in control theory. The existing work that provides finite sample guarantees for linear system identification typically uses data from a single long system trajectory under i.i.d. random inputs, and assumes that the underlying dynamics is truly linear. In contrast, we consider the problem of identifying a linearized model when the true underlying dynamics is nonlinear, given that there is a certain constraint on the region where one can initialize the experiments. We provide a multiple trajectories-based deterministic data acquisition algorithm followed by a regularized least squares algorithm, and provide a finite sample error bound on the learned linearized dynamics. Our error bound shows that one can consistently learn the linearized dynamics, and demonstrates a trade-off between the error due to nonlinearity and the error due to noise. We validate our results through numerical experiments, where we also show the potential insufficiency of linear system identification using a single trajectory with i.i.d. random inputs, when nonlinearity does exist.
comment: 12 pages, 5 figurues. arXiv admin note: substantial text overlap with arXiv:2309.08805
Learning Economic Model Predictive Control via Clustering and Kernel-Based Lipschitz Regression
This paper presents a novel learning economic model predictive control scheme for uncertain nonlinear systems subject to input and state constraints and unknown dynamics. We design a fast and accurate Lipschitz regression method using input and output data that combines clustering and kernel regression to learn the unknown dynamics. In each cluster, the parallel convex optimization problems are solved to estimate the kernel weights and reduce the Lipschitz constant of the predictor, hence limiting the error propagation in the prediction horizon. We derive the two different bounds of learning errors in deterministic and probabilistic forms and customize a new robust constraint-tightening strategy for the discontinuous predictor. Then, the learning economic model predictive control algorithm is formulated by introducing a stabilized optimization problem to construct a Lyapunov function. Sufficient conditions are derived to ensure the recursive feasibility and input-to-state stability of the closed-loop system. The effectiveness of the proposed algorithm is verified by simulations of a numerical example and a continuously stirred tank reactor.
comment: Submitted to the Journal of the Franklin Institute; currently under review
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/continental/pred2plan.
Learning to Drive Anywhere with Model-Based Reannotation11
Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.
comment: 19 pages, 11 figures, 8 tables
Barrier Function Overrides For Non-Convex Fixed Wing Flight Control and Self-Driving Cars
Reinforcement Learning (RL) has enabled vast performance improvements for robotics systems. To achieve these results though, the agent often must randomly explore the environment, which for safety critical systems presents a significant challenge. Barrier functions can solve this challenge by enabling an override that approximates the RL control input as closely as possible without violating a safety constraint. Unfortunately, this override can be computationally intractable in cases where the dynamics are not convex in the control input or when time is discrete, as is often the case when training RL systems. We therefore consider these cases, developing novel barrier functions for two non-convex systems (fixed wing aircraft and self-driving cars performing lane merging with adaptive cruise control) in discrete time. Although solving for an online and optimal override is in general intractable when the dynamics are nonconvex in the control input, we investigate approximate solutions, finding that these approximations enable performance commensurate with baseline RL methods with zero safety violations. In particular, even without attempting to solve for the optimal override at all, performance is still competitive with baseline RL performance. We discuss the tradeoffs of the approximate override solutions including performance and computational tractability.
comment: This work has been submitted to the IEEE for possible publication
Model-Based Closed-Loop Control Algorithm for Stochastic Partial Differential Equation Control
Neural operators have demonstrated promise in modeling and controlling systems governed by Partial Differential Equations (PDEs). Beyond PDEs, Stochastic Partial Differential Equations (SPDEs) play a critical role in modeling systems influenced by randomness, with applications in finance, physics, and beyond. However, controlling SPDE-governed systems remains a significant challenge. On the one hand, the regularity of the system's state (which can be intuitively understood as smoothness) deteriorates, making modeling and generalization more challenging. On the other hand, this stochasticity also renders control more unstable and thus less accurate. To address this gap, we propose the Model-Based Closed-Loop Control Algorithm (MB-CC), the first model-based closed-loop control method for SPDEs. MB-CC introduces two key innovations to enhance control robustness and efficiency: a Regularity Feature (RF) block and a closed-loop strategy with an operator-encoded policy network. The RF block, inspired by the regularity structure theory of SPDEs, addresses noise-induced irregularities by transforming the network's input, including the system state and noise-perturbed external forces, into a refined feature space for improved forward prediction. Compared to previous works using regularity features, we introduce a new parameterization, data augmentation, and extend the RF block as a plug-and-play component. Additionally, to achieve closed-loop control, we introduce an operator-encoded policy network to map the current state to optimal control, which integrates physical priors and swiftly makes decisions based on states returned by the environment. We conduct a systematic evaluation of MB-CC on two notable SPDEs, showcasing its effectiveness and efficiency. The ablation studies show its ability to handle stochasticity more effectively.
Efficient Estimation of Relaxed Model Parameters for Robust UAV Trajectory Optimization
Online trajectory optimization and optimal control methods are crucial for enabling sustainable unmanned aerial vehicle (UAV) services, such as agriculture, environmental monitoring, and transportation, where available actuation and energy are limited. However, optimal controllers are highly sensitive to model mismatch, which can occur due to loaded equipment, packages to be delivered, or pre-existing variability in fundamental structural and thrust-related parameters. To circumvent this problem, optimal controllers can be paired with parameter estimators to improve their trajectory planning performance and perform adaptive control. However, UAV platforms are limited in terms of onboard processing power, oftentimes making nonlinear parameter estimation too computationally expensive to consider. To address these issues, we propose a relaxed, affine-in-parameters multirotor model along with an efficient optimal parameter estimator. We convexify the nominal Moving Horizon Parameter Estimation (MHPE) problem into a linear-quadratic form (LQ-MHPE) via an affine-in-parameter relaxation on the nonlinear dynamics, resulting in fast quadratic programs (QPs) that facilitate adaptive Model Predictve Control (MPC) in real time. We compare this approach to the equivalent nonlinear estimator in Monte Carlo simulations, demonstrating a decrease in average solve time and trajectory optimality cost by 98.2% and 23.9-56.2%, respectively.
comment: 8 pages, 5 figures, to published in IEEE Sustech 2025
Fast Whole-Body Strain Regulation in Continuum Robots
We propose reaching steps towards the real-time strain control of multiphysics, multiscale continuum soft robots. To study this problem fundamentally, we ground ourselves in a model-based control setting enabled by mathematically precise dynamics of a soft robot prototype. Poised to integrate, rather than reject, inherent mechanical nonlinearities for embodied compliance, we first separate the original robot dynamics into separate subdynamics -- aided by a perturbing time-scale separation parameter. Second, we prescribe a set of stabilizing nonlinear backstepping controllers for regulating the resulting subsystems' strain dynamics. Third, we study the interconnected singularly perturbed system by analyzing and establishing its stability. Fourth, our theories are backed up by fast numerical results on a single arm of the Octopus robot arm. We demonstrate strain regulation to equilibrium, in a significantly reduced time, of the whole-body reduced-order dynamics of an infinite degrees-of-freedom soft robot. This paper communicates our thinking within the backdrop of embodied intelligence: it informs our conceptualization, formulation, computational setup, and yields improved control performance for infinite degrees-of-freedom soft robots.
Formulating the Restoration of Distribution Networks as a Multiple Traveling Salesman Problem
Severe weather events can cause extensive damage to electrical distribution networks, requiring a multi-day restoration effort. Optimizing the dispatch of repair crews minimizes the severe socio-economic consequences of such events. Considering both repair times and travel times, we use graphical manipulations to transform this multiple crew scheduling problem into a type of traveling salesman problem(TSP). Specifically, we demonstrate that the restoration problem bears major resemblance to an instance of a cost constrained reward maximizing mTSP (multiple TSP) on node and edge weighted (doubly weighted) graphs (a variant we dub the CCRM-mTSP-DW), where the objective is to maximize the aggregate reward earned during the upcoming restoration window, provided no crew violates its time budget and electrical continuity constraints are met. Despite the rich history of research on the TSP and its variants, this CCRM-mTSP-DW variant has not been studied before, although its closest cousin happens to be the "Selective TSP" (S-TSP). This reinterpretation of the restoration problem not only opens up the possibility of drawing on existing solution methods developed for the TSP and its variants, it also adds a new chapter in the annals of research on "TSP-like'' problems. In this paper, we propose a "TSP-like'' mixed integer linear programming (MILP) model for solving the restoration problem and validate it on the IEEE PES 123-node test feeder network.
comment: This is the JOURNAL version (10 pages, 9 figures, 1 table) of the paper. The previous versions (v1 & v2) are the extended version (26 pages, 17 figures, 1 table)
On Space-Filling Input Design for Nonlinear Dynamic Model Learning: A Gaussian Process Approach
While optimal input design for linear systems has been well-established, no systematic approach exists for nonlinear systems where robustness to extrapolation/interpolation errors is prioritized over minimizing estimated parameter variance. To address this issue, we develop a novel space-filling input design strategy for nonlinear system identification that ensures data coverage of a given region of interest. By placing a Gaussian Process (GP) prior on the joint input-state space, the proposed strategy leverages the GP posterior variance to construct a cost function that promotes space-filling input design. Consequently, this enables maximization of the coverage in the region of interest, thereby facilitating the generation of informative datasets. Furthermore, we theoretically prove that minimization of the cost function implies the space-filling property of the obtained data. Effectiveness of the proposed strategy is demonstrated on both an academic and a mass-spring-damper example, highlighting its potential practical impact on efficient exploration of the dynamics of nonlinear systems.
comment: Submitted to IEEE Control Systems Letters and CDC 2025
Learning Stable Koopman Embeddings for Identification and Control
This paper introduces new model parameterizations for learning discrete-time dynamical systems from data via the Koopman operator and studies their properties. Whereas most existing works on Koopman learning do not take into account the stability or stabilizability of the model -- two fundamental pieces of prior knowledge about a given system to be identified -- in this paper, we propose new classes of Koopman models that have built-in guarantees of these properties. These guarantees are achieved through a novel {\em direct parameterization approach} that leads to {\em unconstrained} optimization problems over their parameter sets. {These results rely on the invertibility of the vector fields for autonomous systems and the generalized feedback linearizability (under smooth feedback), respectively.} To explore the representational flexibility of these model sets, we establish the theoretical connections between the stability of discrete-time Koopman embedding and contraction-based forms of nonlinear stability and stabilizability. The proposed approach is illustrated in applications to stable nonlinear system identification and imitation learning via stabilizable models. Simulation results empirically show that the proposed learning approaches outperform prior methods lacking stability guarantees.
Inverse Inference on Cooperative Control of Networked Dynamical Systems
Recent years have witnessed the rapid advancement of understanding the control mechanism of networked dynamical systems (NDSs), which are governed by components such as nodal dynamics and topology. This paper reveals that the critical components in continuous-time state feedback cooperative control of NDSs can be inferred merely from discrete observations. In particular, we advocate a bi-level inference framework to estimate the global closed-loop system and extract the components, respectively. The novelty lies in bridging the gap from discrete observations to the continuous-time model and effectively decoupling the concerned components. Specifically, in the first level, we design a causality-based estimator for the discrete-time closed-loop system matrix, which can achieve asymptotically unbiased performance when the NDS is stable. In the second level, we introduce a matrix logarithm based method to recover the continuous-time counterpart matrix, providing new sampling period guarantees and establishing the recovery error bound. By utilizing graph properties of the NDS, we develop least square based procedures to decouple the concerned components with up to a scalar ambiguity. Furthermore, we employ inverse optimal control techniques to reconstruct the objective function driving the control process, deriving necessary conditions for the solutions. Numerical simulations demonstrate the effectiveness of the proposed method.
comment: 14 pages
Symbolic and User-friendly Geometric Algebra Routines (SUGAR) for Computations in Matlab
Geometric algebra (GA) is a mathematical tool for geometric computing, providing a framework that allows a unified and compact approach to geometric relations which in other mathematical systems are typically described using different more complicated elements. This fact has led to an increasing adoption of GA in applied mathematics and engineering problems. However, the scarcity of symbolic implementations of GA and its inherent complexity, requiring a specific mathematical background, make it challenging and less intuitive for engineers to work with. This prevents wider adoption among more applied professionals. To address this challenge, this paper introduces SUGAR (Symbolic and User-friendly Geometric Algebra Routines), an open-source toolbox designed for Matlab and licensed under the MIT License. SUGAR facilitates the translation of GA concepts into Matlab and provides a collection of user-friendly functions tailored for GA computations, including support for symbolic operations. It supports both numeric and symbolic computations in high-dimensional GAs. Specifically tailored for applied mathematics and engineering applications, SUGAR has been meticulously engineered to represent geometric elements and transformations within two and three-dimensional projective and conformal geometric algebras, aligning with established computational methodologies in the literature. Furthermore, SUGAR efficiently handles functions of multivectors, such as exponential, logarithmic, sinusoidal, and cosine functions, enhancing its applicability across various engineering domains, including robotics, control systems, and power electronics. Finally, this work includes four distinct validation examples, demonstrating SUGAR's capabilities across the above-mentioned fields and its practical utility in addressing real-world applied mathematics and engineering problems.
comment: 33 pages, 6 figures, journal paper accepted in ACM TOMS
Diffusion Models for Intelligent Transportation Systems: A Survey
Intelligent Transportation Systems (ITS) are vital in modern traffic management and optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models have emerged as transformative tools for addressing complex challenges within ITS. In this paper, we present a comprehensive survey of diffusion models for ITS, covering both theoretical and practical aspects. First, we introduce the theoretical foundations of diffusion models and their key variants, including conditional diffusion models and latent diffusion models, highlighting their suitability for modeling complex, multi-modal traffic data and enabling controllable generation. Second, we outline the primary challenges in ITS and the corresponding advantages of diffusion models, providing readers with a deeper understanding of the intersection between ITS and diffusion models. Third, we offer a multi-perspective investigation of current applications of diffusion models in ITS domains, including autonomous driving, traffic simulation, trajectory prediction, and traffic safety. Finally, we discuss state-of-the-art diffusion model techniques and highlight key ITS research directions that warrant further investigation. Through this structured overview, we aim to provide researchers with a comprehensive understanding of diffusion models for ITS, thereby advancing their future applications in the transportation domain.
comment: 7 figures
On Incremental Stability of Interconnected Switched Systems
In this paper, the incremental stability of interconnected switched nonlinear systems is discussed. The nature of switching considered is state-dependent. The incremental stability of the switched interconnected system is a stronger property compared to the conventional notion of stability. Even if individual systems in the interconnected setting are stable, guaranteeing stability for the overall system is challenging. However, one of the important features of incremental stability is that the notion is preserved over interconnection. Here, leveraging the contraction-theoretic tools, we derive a set of sufficient conditions for the overall interconnection consisting of bimodal switched systems. To showcase the wider usability of our proposed results, we have also included the effect of external input, which leads to the study of incremental input-to-state stability ($\delta$-ISS). For the case of feedback interconnection, the small gain characterisation is presented for the overall system's $\delta$-ISS. Further, for a special case of feedback, i.e., cascade interconnection, the results are derived. The derived conditions are based on the matrix measure, making it computationally tractable and general. To make the results more general, a generalised interconnection of bimodal switched systems is studied and corresponding sufficient condition for $\delta$-ISS are presented. Two numerical examples are demonstrated and supported with simulation results to verify the theoretical claims.
Fixed-Relative-Switched Threshold Strategies for Consensus Tracking Control of Nonlinear Multiagent Systems
This paper investigates event-triggered consensus tracking in nonlinear semi-strict-feedback multi-agent systems involving one leader and multiple followers. We first employ radial basis function neural networks and backstepping techniques to approximate the unknown nonlinear dynamics, facilitating the design of dual observers to measure the unknown states and disturbances. Then three adaptive event-triggered control schemes are proposed: fixed-threshold, relative-threshold, and switched-threshold configurations, each featuring specialized controller architectures and triggering mechanisms. Through Lyapunov stability analysis, we establish that the follower agents can asymptotically track the reference trajectory of the leader, meanwhile all error signals remain uniform bounded. Our proposed control strategies effectively prevent Zeno behaviors through stringent exclusion criteria. Finally, an illustrative example is presented, demonstrating the competitive performance of our control framework in achieving consensus tracking and optimizing triggering efficiency.
On the $H$-property for Step-graphons: Residual Case
We investigate the $H$-property for step-graphons. Specifically, we sample graphs $G_n$ on $n$ nodes from a step-graphon and evaluate the probability that $G_n$ has a Hamiltonian decomposition in the asymptotic regime as $n\to\infty$. It has been shown that for almost all step-graphons, this probability converges to either zero or one. We focus in this paper on the residual case where the zero-one law does not apply. We show that the limit of the probability still exists and provide an explicit expression of it. We present a complete proof of the result and validate it through numerical studies.
Higher-Order Meta Distribution Reliability Analysis of Wireless Networks
Communication reliability, as defined by 3GPP, refers to the probability of providing a desired quality of service (QoS). This metric is typically quantified for wireless networks by averaging the QoS success indicator over spatial and temporal random variables. Recently, the meta distribution (MD) has emerged as a two-level performance analysis tool for wireless networks, offering a detailed examination of the outer level (i.e., system-level) reliability versus the inner level (i.e., link-level) reliability thresholds. Most existing studies focus on first-order spatiotemporal MD reliability analyses, and the benefits of leveraging MD reliability for applications beyond this structure remain unexplored, a gap addressed in this paper. We propose a framework for the analysis of higher-order MD reliability of wireless networks considering different levels of temporal dynamicity of random elements in the network where the MD at each layer is leveraged to be used in calculating the MD of the higher layer. We then provide two applications for this framework and provide a detailed analytical and numerical study of the higher-order MD reliability for both examples. The results demonstrate the value of the hierarchical representation of MD reliability across three domains and the impact of the inner-layers target reliabilities on the overall MD reliability measure.
Systems and Control (EESS)
Approximation-free Control for Signal Temporal Logic Specifications using Spatiotemporal Tubes
This paper presents a spatiotemporal tube (STT)-based control framework for satisfying Signal Temporal Logic (STL) specifications in unknown control-affine systems. We formulate STL constraints as a robust optimization problem (ROP) and recast it as a scenario optimization program (SOP) to construct STTs with formal correctness guarantees. We also propose a closed-form control law that operates independently of the system dynamics, and ensures the system trajectory evolves within the STTs, thereby satisfying the STL specifications. The proposed approach is validated through case studies and comparisons with state-of-the-art methods, demonstrating superior computational efficiency, trajectory quality, and applicability to complex STL tasks.
Localization and path following for an autonomous e-scooter
In order to mitigate economical, ecological, and societal challenges in electric scooter (e-scooter) sharing systems, we develop an autonomous e-scooter prototype. Our vision is to design a fully autonomous prototype that can find its way to the next parking spot, high-demand area, or charging station. In this work, we propose a path following solution to enable localization and navigation in an urban environment with a provided path to follow. We design a closed-loop architecture that solves the localization and path following problem while allowing the e-scooter to maintain its balance with a previously developed reaction wheel mechanism. Our approach facilitates state and input constraints, e.g., adhering to the path width, while remaining executable on a Raspberry Pi 5. We demonstrate the efficacy of our approach in a real-world experiment on our prototype.
Optimal Microgrid Sizing of Offshore Renewable Energy Sources for Offshore Platforms and Coastal Communities
The global energy landscape is undergoing a transformative shift towards renewable energy and advanced storage solutions, driven by the urgent need for sustainable and resilient power systems. Isolated offshore communities, such as islands and offshore platforms, which traditionally rely on mainland grids or diesel generators, stand to gain significantly from renewable energy integration. Promising offshore renewable technologies include wind turbines, wave and tidal energy converters, and floating photovoltaic systems, paired with a storage solution like battery energy storage systems. This paper introduces a renewable energy microgrid optimizer (REMO), a tool designed to identify the optimal sizes of renewable generation and storage resources for offshore microgrids. A key challenge in such models is accurately accounting for battery degradation costs. To address this, the REMO model integrates a deep neural network-based battery degradation (DNN-BD) module, which factors in variables like ambient temperature, charge/discharge rates, state of charge, depth of discharge and battery health. Simulations on six test regions demonstrate that the REMO-DNN-BD approach minimizes lifetime energy costs while maintaining high reliability and sustainability, making it a viable design solution for offshore microgrid systems.
CV-MP: Max-Pressure Control in Heterogeneously Distributed and Partially Connected Vehicle Environments
Max-pressure (MP) control has emerged as a prominent real-time network traffic signal control strategy due to its simplicity, decentralized structure, and theoretical guarantees of network queue stability. Meanwhile, advances in connected vehicle (CV) technology have sparked extensive research into CV-based traffic signal control. Despite these developments, few studies have investigated MP control in heterogeneously distributed and partially CV environments while ensuring network queue stability. To address these research gaps, we propose a CV-based MP control (CV-MP) method that leverages real-time CV travel time information to compute the pressure, thereby incorporating both the spatial distribution and temporal delays of vehicles, unlike existing approaches that utilized only spatial distribution or temporal delays. In particular, we establish sufficient conditions for road network queue stability that are compatible with most existing MP control methods. Moreover, we pioneered the proof of network queue stability even if the vehicles are only partially connected and heterogeneously distributed, and gave a necessary condition of CV observation for maintaining the stability. Evaluation results on an Amsterdam corridor show that CV-MP significantly reduces vehicle delays compared to both actuated control and conventional MP control across various CV penetration rates. Moreover, in scenarios with dynamic traffic demand, CV-MP achieves lower spillover peaks even with low and heterogeneous CV penetration rates, further highlighting its effectiveness and robustness.
High Altitude Platform-Based Caching and Multicasting for Rural Connectivity
Providing efficient and reliable content delivery in rural areas remains a significant challenge due to the lack of communication infrastructure. To bridge the digital divide, this paper investigates the potential of leveraging multiple high-altitude platforms (HAPs) for energy-efficient content delivery in wide rural regions. Each caching-enabled HAP is equipped with both Free-Space Optical (FSO) transceivers for backhaul links and Radio Frequency (RF) antenna arrays for access links. To further enhance network efficiency, we consider a network coding-based multicasting scheme, where different types of content are treated as distinct multicast sessions. With the objective of minimizing long-term power cost, we propose a hierarchical framework that integrates deep reinforcement learn-ing (DRL) and convex optimization to jointly optimize dynamic caching strategies and resource allocation across the network. Simulation results demonstrate that our approach significantly reduces power cost compared to several baseline approaches, providing a practical solution for improving rural connectivity.
comment: 13 pages, 8 figures, submitted to IEEE journals for possible publication
Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network
Federated Learning (FL) has revolutionized collaborative model training in distributed networks, prioritizing data privacy and communication efficiency. This paper investigates efficient deployment of FL in wireless heterogeneous networks, focusing on strategies to accelerate convergence despite stragglers. The primary objective is to minimize long-term convergence wall-clock time through optimized user scheduling and resource allocation. While stragglers may introduce delays in a single round, their inclusion can expedite subsequent rounds, particularly when they possess critical information. Moreover, balancing single-round duration with the number of cumulative rounds, compounded by dynamic training and transmission conditions, necessitates a novel approach beyond conventional optimization solutions. To tackle these challenges, convergence analysis with respect to adaptive and biased scheduling is derived. Then, by factoring in real-time system and statistical information, including diverse energy constraints and users' energy harvesting capabilities, a deep reinforcement learning approach, empowered by proximal policy optimization, is employed to adaptively select user sets. For the scheduled users, Lagrangian decomposition is applied to optimize local resource utilization, further enhancing system efficiency. Simulation results validate the effectiveness and robustness of the proposed framework for various FL tasks, demonstrating reduced task time compared to existing benchmarks under various settings.
comment: 13 pages, 9 figures
LAPSO: A Unified Optimization View for Learning-Augmented Power System Operations
With the high penetration of renewables, traditional model-based power system operation is challenged to deliver economic, stable, and robust decisions. Machine learning has emerged as a powerful modeling tool for capturing complex dynamics to address these challenges. However, its separate design often lacks systematic integration with existing methods. To fill the gap, this paper proposes a holistic framework of Learning-Augmented Power System Operations (LAPSO, pronounced as Lap-So). Adopting a native optimization perspective, LAPSO is centered on the operation stage and aims to break the boundary between temporally siloed power system tasks, such as forecast, operation and control, while unifying the objectives of machine learning and model-based optimizations at both training and inference stages. Systematic analysis and simulations demonstrate the effectiveness of applying LAPSO in designing new integrated algorithms, such as stability-constrained optimization (SCO) and objective-based forecasting (OBF), while enabling end-to-end tracing of different sources of uncertainties. In addition, a dedicated Python package-lapso is introduced to automatically augment existing power system optimization models with learnable components. All code and data are available at https://github.com/xuwkk/lapso_exp.
Day-Ahead Bidding Strategies for Wind Farm Operators under a One-Price Balancing Scheme
We study day-ahead bidding strategies for wind farm operators under a one-price balancing scheme, prevalent in European electricity markets. In this setting, the profit-maximising strategy becomes an all-or-nothing strategy, aiming to take advantage of open positions in the balancing market. However, balancing prices are difficult, if not impossible, to forecast in the day-ahead stage and large open positions can affect the balancing price by changing the direction of the system imbalance. This paper addresses day-ahead bidding as a decision-making problem under uncertainty, with the objective of maximising the expected profit while reducing the imbalance risk related to the strategy. To this end, we develop a stochastic optimisation problem with explicit constraints on the positions in the balancing market, providing risk certificates, and derive an analytical solution to this problem. Moreover, we show how the price-impact of the trading strategy on the balancing market can be included in the ex-post evaluation. Using real data from the Belgian electricity market and an offshore wind farm in the North Sea, we demonstrate that the all-or-nothing strategy negatively impacts the balancing price, resulting in long-term losses for the wind farm. Our risk-constrained strategy, however, can still significantly enhance operational profit compared to traditional point-forecast bidding.
Predictive Control of EV Overnight Charging with Multi-Session Flexibility
The majority of electric vehicles (EVs) are charged domestically overnight, where the precise timing of power allocation is not important to the user, thus representing a source of flexibility that can be leveraged by charging control algorithms. In this paper, we relax the common assumption, that EVs require full charge every morning, enabling additional flexibility to defer charging of surplus energy to subsequent nights, which can enhance the performance of controlled charging. In particular, we consider a simple domestic smart plug, scheduling power delivery with the objective to minimize CO$_2$ emissions over prediction horizons of multiple sessions -- up to seven days ahead -- utilising model predictive control (MPC). Based on carbon intensity data from the UK National Grid, we demonstrate significant potential for emission reductions with multi-session planning of 40 to 46\% compared to uncontrolled charging and 19 to 26\% compared to single-session planning. Furthermore, we assess, how the driving and charging behaviour of EV users affects the available flexibility and consequentially the potential for emission reductions. Finally, using grid carbon intensity data from 14 different UK regions, we report significant variations in absolute emission reductions based on the local energy mix.
Enhanced Robust Tracking Control: An Online Learning Approach
This work focuses the tracking control problem for nonlinear systems subjected to unknown external disturbances. Inspired by contraction theory, a neural network-dirven CCM synthesis is adopted to obtain a feedback controller that could track any feasible trajectory. Based on the observation that the system states under continuous control input inherently contain embedded information about unknown external disturbances, we propose an online learning scheme that captures the disturbances dyanmics from online historical data and embeds the compensation within the CCM controller. The proposed scheme operates as a plug-and-play module that intrinsically enhances the tracking performance of CCM synthesis. The numerical simulations on tethered space robot and PVTOL demonstrate the effectiveness of proposed scheme. The source code of the proposed online learning scheme can be found at https://github.com/NPU-RCIR/Online_CCM.git.
Semi-Explicit Solution of Some Discrete-Time Mean-Field-Type Games with Higher-Order Costs
Traditional solvable game theory and mean-field-type game theory (risk-aware games) predominantly focus on quadratic costs due to their analytical tractability. Nevertheless, they often fail to capture critical non-linearities inherent in real-world systems. In this work, we present a unified framework for solving discrete-time game problems with higher-order state and strategy costs involving power-law terms. We derive semi-explicit expressions for equilibrium strategies, cost-to-go functions, and recursive coefficient dynamics across deterministic, stochastic, and multi-agent system settings by convex-completion techniques. The contributions include variance-aware solutions under additive and multiplicative noise, extensions to mean-field-type-dependent dynamics, and conditions that ensure the positivity of recursive coefficients. Our results provide a foundational methodology for analyzing non linear multi-agent systems under higher-order penalization, bridging classical game theory and mean-field-type game theory with modern computational tools for engineering applications.
comment: arXiv admin note: text overlap with arXiv:2505.04112
A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation
We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The guidance layer generates multiple trajectories in parallel, each representing a distinct strategy for obstacle avoidance or non-passing. The underlying trajectory optimization constrains the joint probability of collision with VRUs under generic uncertainties. To address extraordinary situations ("edge cases") that go beyond the autonomous capabilities - such as construction zones or encounters with emergency responders - the system includes an option for remote human operation, supported by visual and haptic guidance. In simulation, our motion planner outperforms three baseline approaches in terms of safety and efficiency. We also demonstrate the full system in prototype vehicle tests on a closed track, both in autonomous and remotely operated modes.
comment: Intelligent Vehicles Symposium 2025
LVLM-MPC Collaboration for Autonomous Driving: A Safety-Aware and Task-Scalable Control Architecture
This paper proposes a novel Large Vision-Language Model (LVLM) and Model Predictive Control (MPC) integration framework that delivers both task scalability and safety for Autonomous Driving (AD). LVLMs excel at high-level task planning across diverse driving scenarios. However, since these foundation models are not specifically designed for driving and their reasoning is not consistent with the feasibility of low-level motion planning, concerns remain regarding safety and smooth task switching. This paper integrates LVLMs with MPC Builder, which automatically generates MPCs on demand, based on symbolic task commands generated by the LVLM, while ensuring optimality and safety. The generated MPCs can strongly assist the execution or rejection of LVLM-driven task switching by providing feedback on the feasibility of the given tasks and generating task-switching-aware MPCs. Our approach provides a safe, flexible, and adaptable control framework, bridging the gap between cutting-edge foundation models and reliable vehicle operation. We demonstrate the effectiveness of our approach through a simulation experiment, showing that our system can safely and effectively handle highway driving while maintaining the flexibility and adaptability of LVLMs.
comment: 8 pages, 8 figures
Learning Linearized Models from Nonlinear Systems under Initialization Constraints with Finite Data
The identification of a linear system model from data has wide applications in control theory. The existing work that provides finite sample guarantees for linear system identification typically uses data from a single long system trajectory under i.i.d. random inputs, and assumes that the underlying dynamics is truly linear. In contrast, we consider the problem of identifying a linearized model when the true underlying dynamics is nonlinear, given that there is a certain constraint on the region where one can initialize the experiments. We provide a multiple trajectories-based deterministic data acquisition algorithm followed by a regularized least squares algorithm, and provide a finite sample error bound on the learned linearized dynamics. Our error bound shows that one can consistently learn the linearized dynamics, and demonstrates a trade-off between the error due to nonlinearity and the error due to noise. We validate our results through numerical experiments, where we also show the potential insufficiency of linear system identification using a single trajectory with i.i.d. random inputs, when nonlinearity does exist.
comment: 12 pages, 5 figurues. arXiv admin note: substantial text overlap with arXiv:2309.08805
Learning Economic Model Predictive Control via Clustering and Kernel-Based Lipschitz Regression
This paper presents a novel learning economic model predictive control scheme for uncertain nonlinear systems subject to input and state constraints and unknown dynamics. We design a fast and accurate Lipschitz regression method using input and output data that combines clustering and kernel regression to learn the unknown dynamics. In each cluster, the parallel convex optimization problems are solved to estimate the kernel weights and reduce the Lipschitz constant of the predictor, hence limiting the error propagation in the prediction horizon. We derive the two different bounds of learning errors in deterministic and probabilistic forms and customize a new robust constraint-tightening strategy for the discontinuous predictor. Then, the learning economic model predictive control algorithm is formulated by introducing a stabilized optimization problem to construct a Lyapunov function. Sufficient conditions are derived to ensure the recursive feasibility and input-to-state stability of the closed-loop system. The effectiveness of the proposed algorithm is verified by simulations of a numerical example and a continuously stirred tank reactor.
comment: Submitted to the Journal of the Franklin Institute; currently under review
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/continental/pred2plan.
Learning to Drive Anywhere with Model-Based Reannotation11
Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.
comment: 19 pages, 11 figures, 8 tables
Barrier Function Overrides For Non-Convex Fixed Wing Flight Control and Self-Driving Cars
Reinforcement Learning (RL) has enabled vast performance improvements for robotics systems. To achieve these results though, the agent often must randomly explore the environment, which for safety critical systems presents a significant challenge. Barrier functions can solve this challenge by enabling an override that approximates the RL control input as closely as possible without violating a safety constraint. Unfortunately, this override can be computationally intractable in cases where the dynamics are not convex in the control input or when time is discrete, as is often the case when training RL systems. We therefore consider these cases, developing novel barrier functions for two non-convex systems (fixed wing aircraft and self-driving cars performing lane merging with adaptive cruise control) in discrete time. Although solving for an online and optimal override is in general intractable when the dynamics are nonconvex in the control input, we investigate approximate solutions, finding that these approximations enable performance commensurate with baseline RL methods with zero safety violations. In particular, even without attempting to solve for the optimal override at all, performance is still competitive with baseline RL performance. We discuss the tradeoffs of the approximate override solutions including performance and computational tractability.
comment: This work has been submitted to the IEEE for possible publication
Model-Based Closed-Loop Control Algorithm for Stochastic Partial Differential Equation Control
Neural operators have demonstrated promise in modeling and controlling systems governed by Partial Differential Equations (PDEs). Beyond PDEs, Stochastic Partial Differential Equations (SPDEs) play a critical role in modeling systems influenced by randomness, with applications in finance, physics, and beyond. However, controlling SPDE-governed systems remains a significant challenge. On the one hand, the regularity of the system's state (which can be intuitively understood as smoothness) deteriorates, making modeling and generalization more challenging. On the other hand, this stochasticity also renders control more unstable and thus less accurate. To address this gap, we propose the Model-Based Closed-Loop Control Algorithm (MB-CC), the first model-based closed-loop control method for SPDEs. MB-CC introduces two key innovations to enhance control robustness and efficiency: a Regularity Feature (RF) block and a closed-loop strategy with an operator-encoded policy network. The RF block, inspired by the regularity structure theory of SPDEs, addresses noise-induced irregularities by transforming the network's input, including the system state and noise-perturbed external forces, into a refined feature space for improved forward prediction. Compared to previous works using regularity features, we introduce a new parameterization, data augmentation, and extend the RF block as a plug-and-play component. Additionally, to achieve closed-loop control, we introduce an operator-encoded policy network to map the current state to optimal control, which integrates physical priors and swiftly makes decisions based on states returned by the environment. We conduct a systematic evaluation of MB-CC on two notable SPDEs, showcasing its effectiveness and efficiency. The ablation studies show its ability to handle stochasticity more effectively.
Efficient Estimation of Relaxed Model Parameters for Robust UAV Trajectory Optimization
Online trajectory optimization and optimal control methods are crucial for enabling sustainable unmanned aerial vehicle (UAV) services, such as agriculture, environmental monitoring, and transportation, where available actuation and energy are limited. However, optimal controllers are highly sensitive to model mismatch, which can occur due to loaded equipment, packages to be delivered, or pre-existing variability in fundamental structural and thrust-related parameters. To circumvent this problem, optimal controllers can be paired with parameter estimators to improve their trajectory planning performance and perform adaptive control. However, UAV platforms are limited in terms of onboard processing power, oftentimes making nonlinear parameter estimation too computationally expensive to consider. To address these issues, we propose a relaxed, affine-in-parameters multirotor model along with an efficient optimal parameter estimator. We convexify the nominal Moving Horizon Parameter Estimation (MHPE) problem into a linear-quadratic form (LQ-MHPE) via an affine-in-parameter relaxation on the nonlinear dynamics, resulting in fast quadratic programs (QPs) that facilitate adaptive Model Predictve Control (MPC) in real time. We compare this approach to the equivalent nonlinear estimator in Monte Carlo simulations, demonstrating a decrease in average solve time and trajectory optimality cost by 98.2% and 23.9-56.2%, respectively.
comment: 8 pages, 5 figures, to published in IEEE Sustech 2025
Fast Whole-Body Strain Regulation in Continuum Robots
We propose reaching steps towards the real-time strain control of multiphysics, multiscale continuum soft robots. To study this problem fundamentally, we ground ourselves in a model-based control setting enabled by mathematically precise dynamics of a soft robot prototype. Poised to integrate, rather than reject, inherent mechanical nonlinearities for embodied compliance, we first separate the original robot dynamics into separate subdynamics -- aided by a perturbing time-scale separation parameter. Second, we prescribe a set of stabilizing nonlinear backstepping controllers for regulating the resulting subsystems' strain dynamics. Third, we study the interconnected singularly perturbed system by analyzing and establishing its stability. Fourth, our theories are backed up by fast numerical results on a single arm of the Octopus robot arm. We demonstrate strain regulation to equilibrium, in a significantly reduced time, of the whole-body reduced-order dynamics of an infinite degrees-of-freedom soft robot. This paper communicates our thinking within the backdrop of embodied intelligence: it informs our conceptualization, formulation, computational setup, and yields improved control performance for infinite degrees-of-freedom soft robots.
Formulating the Restoration of Distribution Networks as a Multiple Traveling Salesman Problem
Severe weather events can cause extensive damage to electrical distribution networks, requiring a multi-day restoration effort. Optimizing the dispatch of repair crews minimizes the severe socio-economic consequences of such events. Considering both repair times and travel times, we use graphical manipulations to transform this multiple crew scheduling problem into a type of traveling salesman problem(TSP). Specifically, we demonstrate that the restoration problem bears major resemblance to an instance of a cost constrained reward maximizing mTSP (multiple TSP) on node and edge weighted (doubly weighted) graphs (a variant we dub the CCRM-mTSP-DW), where the objective is to maximize the aggregate reward earned during the upcoming restoration window, provided no crew violates its time budget and electrical continuity constraints are met. Despite the rich history of research on the TSP and its variants, this CCRM-mTSP-DW variant has not been studied before, although its closest cousin happens to be the "Selective TSP" (S-TSP). This reinterpretation of the restoration problem not only opens up the possibility of drawing on existing solution methods developed for the TSP and its variants, it also adds a new chapter in the annals of research on "TSP-like'' problems. In this paper, we propose a "TSP-like'' mixed integer linear programming (MILP) model for solving the restoration problem and validate it on the IEEE PES 123-node test feeder network.
comment: This is the JOURNAL version (10 pages, 9 figures, 1 table) of the paper. The previous versions (v1 & v2) are the extended version (26 pages, 17 figures, 1 table)
On Space-Filling Input Design for Nonlinear Dynamic Model Learning: A Gaussian Process Approach
While optimal input design for linear systems has been well-established, no systematic approach exists for nonlinear systems where robustness to extrapolation/interpolation errors is prioritized over minimizing estimated parameter variance. To address this issue, we develop a novel space-filling input design strategy for nonlinear system identification that ensures data coverage of a given region of interest. By placing a Gaussian Process (GP) prior on the joint input-state space, the proposed strategy leverages the GP posterior variance to construct a cost function that promotes space-filling input design. Consequently, this enables maximization of the coverage in the region of interest, thereby facilitating the generation of informative datasets. Furthermore, we theoretically prove that minimization of the cost function implies the space-filling property of the obtained data. Effectiveness of the proposed strategy is demonstrated on both an academic and a mass-spring-damper example, highlighting its potential practical impact on efficient exploration of the dynamics of nonlinear systems.
comment: Submitted to IEEE Control Systems Letters and CDC 2025
Learning Stable Koopman Embeddings for Identification and Control
This paper introduces new model parameterizations for learning discrete-time dynamical systems from data via the Koopman operator and studies their properties. Whereas most existing works on Koopman learning do not take into account the stability or stabilizability of the model -- two fundamental pieces of prior knowledge about a given system to be identified -- in this paper, we propose new classes of Koopman models that have built-in guarantees of these properties. These guarantees are achieved through a novel {\em direct parameterization approach} that leads to {\em unconstrained} optimization problems over their parameter sets. {These results rely on the invertibility of the vector fields for autonomous systems and the generalized feedback linearizability (under smooth feedback), respectively.} To explore the representational flexibility of these model sets, we establish the theoretical connections between the stability of discrete-time Koopman embedding and contraction-based forms of nonlinear stability and stabilizability. The proposed approach is illustrated in applications to stable nonlinear system identification and imitation learning via stabilizable models. Simulation results empirically show that the proposed learning approaches outperform prior methods lacking stability guarantees.
Inverse Inference on Cooperative Control of Networked Dynamical Systems
Recent years have witnessed the rapid advancement of understanding the control mechanism of networked dynamical systems (NDSs), which are governed by components such as nodal dynamics and topology. This paper reveals that the critical components in continuous-time state feedback cooperative control of NDSs can be inferred merely from discrete observations. In particular, we advocate a bi-level inference framework to estimate the global closed-loop system and extract the components, respectively. The novelty lies in bridging the gap from discrete observations to the continuous-time model and effectively decoupling the concerned components. Specifically, in the first level, we design a causality-based estimator for the discrete-time closed-loop system matrix, which can achieve asymptotically unbiased performance when the NDS is stable. In the second level, we introduce a matrix logarithm based method to recover the continuous-time counterpart matrix, providing new sampling period guarantees and establishing the recovery error bound. By utilizing graph properties of the NDS, we develop least square based procedures to decouple the concerned components with up to a scalar ambiguity. Furthermore, we employ inverse optimal control techniques to reconstruct the objective function driving the control process, deriving necessary conditions for the solutions. Numerical simulations demonstrate the effectiveness of the proposed method.
comment: 14 pages
Symbolic and User-friendly Geometric Algebra Routines (SUGAR) for Computations in Matlab
Geometric algebra (GA) is a mathematical tool for geometric computing, providing a framework that allows a unified and compact approach to geometric relations which in other mathematical systems are typically described using different more complicated elements. This fact has led to an increasing adoption of GA in applied mathematics and engineering problems. However, the scarcity of symbolic implementations of GA and its inherent complexity, requiring a specific mathematical background, make it challenging and less intuitive for engineers to work with. This prevents wider adoption among more applied professionals. To address this challenge, this paper introduces SUGAR (Symbolic and User-friendly Geometric Algebra Routines), an open-source toolbox designed for Matlab and licensed under the MIT License. SUGAR facilitates the translation of GA concepts into Matlab and provides a collection of user-friendly functions tailored for GA computations, including support for symbolic operations. It supports both numeric and symbolic computations in high-dimensional GAs. Specifically tailored for applied mathematics and engineering applications, SUGAR has been meticulously engineered to represent geometric elements and transformations within two and three-dimensional projective and conformal geometric algebras, aligning with established computational methodologies in the literature. Furthermore, SUGAR efficiently handles functions of multivectors, such as exponential, logarithmic, sinusoidal, and cosine functions, enhancing its applicability across various engineering domains, including robotics, control systems, and power electronics. Finally, this work includes four distinct validation examples, demonstrating SUGAR's capabilities across the above-mentioned fields and its practical utility in addressing real-world applied mathematics and engineering problems.
comment: 33 pages, 6 figures, journal paper accepted in ACM TOMS
Diffusion Models for Intelligent Transportation Systems: A Survey
Intelligent Transportation Systems (ITS) are vital in modern traffic management and optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models have emerged as transformative tools for addressing complex challenges within ITS. In this paper, we present a comprehensive survey of diffusion models for ITS, covering both theoretical and practical aspects. First, we introduce the theoretical foundations of diffusion models and their key variants, including conditional diffusion models and latent diffusion models, highlighting their suitability for modeling complex, multi-modal traffic data and enabling controllable generation. Second, we outline the primary challenges in ITS and the corresponding advantages of diffusion models, providing readers with a deeper understanding of the intersection between ITS and diffusion models. Third, we offer a multi-perspective investigation of current applications of diffusion models in ITS domains, including autonomous driving, traffic simulation, trajectory prediction, and traffic safety. Finally, we discuss state-of-the-art diffusion model techniques and highlight key ITS research directions that warrant further investigation. Through this structured overview, we aim to provide researchers with a comprehensive understanding of diffusion models for ITS, thereby advancing their future applications in the transportation domain.
comment: 7 figures
On Incremental Stability of Interconnected Switched Systems
In this paper, the incremental stability of interconnected switched nonlinear systems is discussed. The nature of switching considered is state-dependent. The incremental stability of the switched interconnected system is a stronger property compared to the conventional notion of stability. Even if individual systems in the interconnected setting are stable, guaranteeing stability for the overall system is challenging. However, one of the important features of incremental stability is that the notion is preserved over interconnection. Here, leveraging the contraction-theoretic tools, we derive a set of sufficient conditions for the overall interconnection consisting of bimodal switched systems. To showcase the wider usability of our proposed results, we have also included the effect of external input, which leads to the study of incremental input-to-state stability ($\delta$-ISS). For the case of feedback interconnection, the small gain characterisation is presented for the overall system's $\delta$-ISS. Further, for a special case of feedback, i.e., cascade interconnection, the results are derived. The derived conditions are based on the matrix measure, making it computationally tractable and general. To make the results more general, a generalised interconnection of bimodal switched systems is studied and corresponding sufficient condition for $\delta$-ISS are presented. Two numerical examples are demonstrated and supported with simulation results to verify the theoretical claims.
Fixed-Relative-Switched Threshold Strategies for Consensus Tracking Control of Nonlinear Multiagent Systems
This paper investigates event-triggered consensus tracking in nonlinear semi-strict-feedback multi-agent systems involving one leader and multiple followers. We first employ radial basis function neural networks and backstepping techniques to approximate the unknown nonlinear dynamics, facilitating the design of dual observers to measure the unknown states and disturbances. Then three adaptive event-triggered control schemes are proposed: fixed-threshold, relative-threshold, and switched-threshold configurations, each featuring specialized controller architectures and triggering mechanisms. Through Lyapunov stability analysis, we establish that the follower agents can asymptotically track the reference trajectory of the leader, meanwhile all error signals remain uniform bounded. Our proposed control strategies effectively prevent Zeno behaviors through stringent exclusion criteria. Finally, an illustrative example is presented, demonstrating the competitive performance of our control framework in achieving consensus tracking and optimizing triggering efficiency.
On the $H$-property for Step-graphons: Residual Case
We investigate the $H$-property for step-graphons. Specifically, we sample graphs $G_n$ on $n$ nodes from a step-graphon and evaluate the probability that $G_n$ has a Hamiltonian decomposition in the asymptotic regime as $n\to\infty$. It has been shown that for almost all step-graphons, this probability converges to either zero or one. We focus in this paper on the residual case where the zero-one law does not apply. We show that the limit of the probability still exists and provide an explicit expression of it. We present a complete proof of the result and validate it through numerical studies.
Higher-Order Meta Distribution Reliability Analysis of Wireless Networks
Communication reliability, as defined by 3GPP, refers to the probability of providing a desired quality of service (QoS). This metric is typically quantified for wireless networks by averaging the QoS success indicator over spatial and temporal random variables. Recently, the meta distribution (MD) has emerged as a two-level performance analysis tool for wireless networks, offering a detailed examination of the outer level (i.e., system-level) reliability versus the inner level (i.e., link-level) reliability thresholds. Most existing studies focus on first-order spatiotemporal MD reliability analyses, and the benefits of leveraging MD reliability for applications beyond this structure remain unexplored, a gap addressed in this paper. We propose a framework for the analysis of higher-order MD reliability of wireless networks considering different levels of temporal dynamicity of random elements in the network where the MD at each layer is leveraged to be used in calculating the MD of the higher layer. We then provide two applications for this framework and provide a detailed analytical and numerical study of the higher-order MD reliability for both examples. The results demonstrate the value of the hierarchical representation of MD reliability across three domains and the impact of the inner-layers target reliabilities on the overall MD reliability measure.
Robotics
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Vision is well-known for its use in manipulation, especially using visual servoing. To make it robust, multiple cameras are needed to expand the field of view. That is computationally challenging. Merging multiple views and using Q-learning allows the design of more effective representations and optimization of sample efficiency. Such a solution might be expensive to deploy. To mitigate this, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while augmenting with single-view features to allow lightweight deployment and ensure robust policies. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad
comment: For project website and code, see https://aalmuzairee.github.io/mad
Modeling Personalized Difficulty of Rehabilitation Exercises Using Causal Trees
Rehabilitation robots are often used in game-like interactions for rehabilitation to increase a person's motivation to complete rehabilitation exercises. By adjusting exercise difficulty for a specific user throughout the exercise interaction, robots can maximize both the user's rehabilitation outcomes and the their motivation throughout the exercise. Previous approaches have assumed exercises have generic difficulty values that apply to all users equally, however, we identified that stroke survivors have varied and unique perceptions of exercise difficulty. For example, some stroke survivors found reaching vertically more difficult than reaching farther but lower while others found reaching farther more challenging than reaching vertically. In this paper, we formulate a causal tree-based method to calculate exercise difficulty based on the user's performance. We find that this approach accurately models exercise difficulty and provides a readily interpretable model of why that exercise is difficult for both users and caretakers.
comment: Accepted to IEEE/RAS-EMBS International Conference on Rehabilitation Robotics (ICORR 2025)
Stow: Robotic Packing of Items into Fabric Pods
This paper presents a compliant manipulation system capable of placing items onto densely packed shelves. The wide diversity of items and strict business requirements for high producing rates and low defect generation have prohibited warehouse robotics from performing this task. Our innovations in hardware, perception, decision-making, motion planning, and control have enabled this system to perform over 500,000 stows in a large e-commerce fulfillment center. The system achieves human levels of packing density and speed while prioritizing work on overhead shelves to enhance the safety of humans working alongside the robots.
Hierarchical Task Decomposition for Execution Monitoring and Error Recovery: Understanding the Rationale Behind Task Demonstrations
Multi-step manipulation tasks where robots interact with their environment and must apply process forces based on the perceived situation remain challenging to learn and prone to execution errors. Accurately simulating these tasks is also difficult. Hence, it is crucial for robust task performance to learn how to coordinate end-effector pose and applied force, monitor execution, and react to deviations. To address these challenges, we propose a learning approach that directly infers both low- and high-level task representations from user demonstrations on the real system. We developed an unsupervised task segmentation algorithm that combines intention recognition and feature clustering to infer the skills of a task. We leverage the inferred characteristic features of each skill in a novel unsupervised anomaly detection approach to identify deviations from the intended task execution. Together, these components form a comprehensive framework capable of incrementally learning task decisions and new behaviors as new situations arise. Compared to state-of-the-art learning techniques, our approach significantly reduces the required amount of training data and computational complexity while efficiently learning complex in-contact behaviors and recovery strategies. Our proposed task segmentation and anomaly detection approaches outperform state-of-the-art methods on force-based tasks evaluated on two different robotic systems.
comment: The paper has been accepted for publication by the International Journal of Robotics Research (IJRR), 26 pages
Accelerating Audio Research with Robotic Dummy Heads SP
This work introduces a robotic dummy head that fuses the acoustic realism of conventional audiological mannequins with the mobility of robots. The proposed device is capable of moving, talking, and listening as people do, and can be used to automate spatially-stationary audio experiments, thus accelerating the pace of audio research. Critically, the device may also be used as a moving sound source in dynamic experiments, due to its quiet motor. This feature differentiates our work from previous robotic acoustic research platforms. Validation that the robot enables high quality audio data collection is provided through various experiments and acoustic measurements. These experiments also demonstrate how the robot might be used to study adaptive binaural beamforming. Design files are provided as open-source to stimulate novel audio research.
comment: WASPAA 2025
Model-Based AI planning and Execution Systems for Robotics
Model-based planning and execution systems offer a principled approach to building flexible autonomous robots that can perform diverse tasks by automatically combining a host of basic skills. This idea is almost as old as modern robotics. Yet, while diverse general-purpose reasoning architectures have been proposed since, general-purpose systems that are integrated with modern robotic platforms have emerged only recently, starting with the influential ROSPlan system. Since then, a growing number of model-based systems for robot task-level control have emerged. In this paper, we consider the diverse design choices and issues existing systems attempt to address, the different solutions proposed so far, and suggest avenues for future development.
Estimating Dynamic Soft Continuum Robot States From Boundaries
Accurate state estimation is essential for effective control of robots. For soft robots, this task is particularly challenging because their states are inherently infinite-dimensional functions due to the robots' continuous deformability. Traditional sensing techniques, however, can only provide discrete measurements. Recently, a dynamic state estimation method known as a boundary observer was introduced, which leverages Cosserat rod theory to recover all infinite-dimensional states by measuring only the velocity twist at the robot's tip. In this work, we present a novel boundary observer that can also recover infinite-dimensional dynamic states, but instead relies on measuring the internal wrench at the robot's base. This design exploits the duality between the velocity twist at the tip and the internal wrench at the base, with both types of boundary observers being inspired by principles of energy dissipation. Despite the mathematical duality, the proposed approach offers a distinct advantage: it requires only a 6-axis force/torque sensor embedded at the base, eliminating the need for external sensing systems such as motion capture cameras. Moreover, combining both tip- and base-based techniques enhances energy dissipation, accelerates convergence, and improves estimation accuracy. We validate the proposed algorithms through both simulation studies and experiments based on tendon-driven continuum robots. Our results demonstrate that all boundary observers converge to the ground truth within 3 seconds, even with significantly deviated initial conditions. Furthermore, they recover from unknown perturbations and effectively track high-frequency vibrations. We also show that combining the dual techniques further improves convergence speed and accuracy. Finally, the computational efficiency of these algorithms indicates their feasibility for real-time state estimation.
TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution
Trajectory prediction is a crucial task in modeling human behavior, especially in fields as social robotics and autonomous vehicle navigation. Traditional heuristics based on handcrafted rules often lack accuracy, while recently proposed deep learning approaches suffer from computational cost, lack of explainability, and generalization issues that limit their practical adoption. In this paper, we introduce TrajEvo, a framework that leverages Large Language Models (LLMs) to automatically design trajectory prediction heuristics. TrajEvo employs an evolutionary algorithm to generate and refine prediction heuristics from past trajectory data. We introduce a Cross-Generation Elite Sampling to promote population diversity and a Statistics Feedback Loop allowing the LLM to analyze alternative predictions. Our evaluations show TrajEvo outperforms previous heuristic methods on the ETH-UCY datasets, and remarkably outperforms both heuristics and deep learning methods when generalizing to the unseen SDD dataset. TrajEvo represents a first step toward automated design of fast, explainable, and generalizable trajectory prediction heuristics. We make our source code publicly available to foster future research at https://github.com/ai4co/trajevo.
Do We Still Need to Work on Odometry for Autonomous Driving? ICRA 2025
Over the past decades, a tremendous amount of work has addressed the topic of ego-motion estimation of moving platforms based on various proprioceptive and exteroceptive sensors. At the cost of ever-increasing computational load and sensor complexity, odometry algorithms have reached impressive levels of accuracy with minimal drift in various conditions. In this paper, we question the need for more research on odometry for autonomous driving by assessing the accuracy of one of the simplest algorithms: the direct integration of wheel encoder data and yaw rate measurements from a gyroscope. We denote this algorithm as Odometer-Gyroscope (OG) odometry. This work shows that OG odometry can outperform current state-of-the-art radar-inertial SE(2) odometry for a fraction of the computational cost in most scenarios. For example, the OG odometry is on top of the Boreas leaderboard with a relative translation error of 0.20%, while the second-best method displays an error of 0.26%. Lidar-inertial approaches can provide more accurate estimates, but the computational load is three orders of magnitude higher than the OG odometry. To further the analysis, we have pushed the limits of the OG odometry by purposely violating its fundamental no-slip assumption using data collected during a heavy snowstorm with different driving behaviours. Our conclusion shows that a significant amount of slippage is required to result in non-satisfactory pose estimates from the OG odometry.
comment: ICRA 2025, Field Robotics workshop
A Case Study on the Application of Digital Twins for Enhancing CPS Operations
To ensure the availability and reduce the downtime of complex cyber-physical systems across different domains, e.g., agriculture and manufacturing, fault tolerance mechanisms are implemented which are complex in both their development and operation. In addition, cyber-physical systems are often confronted with limited hardware resources or are legacy systems, both often hindering the addition of new functionalities directly on the onboard hardware. Digital Twins can be adopted to offload expensive computations, as well as providing support through fault tolerance mechanisms, thus decreasing costs and operational downtime of cyber-physical systems. In this paper, we show the feasibility of a Digital Twin used for enhancing cyber-physical system operations, specifically through functional augmentation and increased fault tolerance, in an industry-oriented use case.
comment: In Proceedings ASQAP 2025, arXiv:2505.02873
RGB-Event Fusion with Self-Attention for Collision Prediction
Ensuring robust and real-time obstacle avoidance is critical for the safe operation of autonomous robots in dynamic, real-world environments. This paper proposes a neural network framework for predicting the time and collision position of an unmanned aerial vehicle with a dynamic object, using RGB and event-based vision sensors. The proposed architecture consists of two separate encoder branches, one for each modality, followed by fusion by self-attention to improve prediction accuracy. To facilitate benchmarking, we leverage the ABCD [8] dataset collected that enables detailed comparisons of single-modality and fusion-based approaches. At the same prediction throughput of 50Hz, the experimental results show that the fusion-based model offers an improvement in prediction accuracy over single-modality approaches of 1% on average and 10% for distances beyond 0.5m, but comes at the cost of +71% in memory and + 105% in FLOPs. Notably, the event-based model outperforms the RGB model by 4% for position and 26% for time error at a similar computational cost, making it a competitive alternative. Additionally, we evaluate quantized versions of the event-based models, applying 1- to 8-bit quantization to assess the trade-offs between predictive performance and computational efficiency. These findings highlight the trade-offs of multi-modal perception using RGB and event-based cameras in robotic applications.
Automating Box Folding: Sequence Extraction and Ranking Methodologies
Box folding represents a crucial challenge for automated packaging systems. This work bridges the gap between existing methods for folding sequence extraction and approaches focused on the adaptability of automated systems to specific box types. An innovative method is proposed to identify and rank folding sequences, enabling the transformation of a box from an initial state to a desired final configuration. The system evaluates and ranks these sequences based on their feasibility and compatibility with available hardware, providing recommendations for real-world implementations. Finally, an illustrative use case is presented, where a robot performs the folding of a box.
comment: Accepted at IFAC ROBOTICS 2025
Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections
Unsignalized intersections pose significant safety and efficiency challenges due to complex traffic flows. This paper proposes a novel roadside unit (RSU)-centric cooperative driving system leveraging global perception and vehicle-to-infrastructure (V2I) communication. The core of the system is an RSU-based decision-making module using a two-stage hybrid reinforcement learning (RL) framework. At first, policies are pre-trained offline using conservative Q-learning (CQL) combined with behavior cloning (BC) on collected dataset. Subsequently, these policies are fine-tuned in the simulation using multi-agent proximal policy optimization (MAPPO), aligned with a self-attention mechanism to effectively solve inter-agent dependencies. RSUs perform real-time inference based on the trained models to realize vehicle control via V2I communications. Extensive experiments in CARLA environment demonstrate high effectiveness of the proposed system, by: \textit{(i)} achieving failure rates below 0.03\% in coordinating three connected and autonomous vehicles (CAVs) through complex intersection scenarios, significantly outperforming the traditional Autoware control method, and \textit{(ii)} exhibiting strong robustness across varying numbers of controlled agents and shows promising generalization capabilities on other maps.
comment: 7 pages
Low Resolution Next Best View for Robot Packing
Automating the packing of objects with robots is a key challenge in industrial automation, where efficient object perception plays a fundamental role. This paper focuses on scenarios where precise 3D reconstruction is not required, prioritizing cost-effective and scalable solutions. The proposed Low-Resolution Next Best View (LR-NBV) algorithm leverages a utility function that balances pose redundancy and acquisition density, ensuring efficient object reconstruction. Experimental validation demonstrates that LR-NBV consistently outperforms standard NBV approaches, achieving comparable accuracy with significantly fewer poses. This method proves highly suitable for applications requiring efficiency, scalability, and adaptability without relying on high-precision sensing.
comment: Paper accepted at IFAC ROBOTICS 2025
Trajectory Entropy Reinforcement Learning for Predictable and Robust Control
Simplicity is a critical inductive bias for designing data-driven controllers, especially when robustness is important. Despite the impressive results of deep reinforcement learning in complex control tasks, it is prone to capturing intricate and spurious correlations between observations and actions, leading to failure under slight perturbations to the environment. To tackle this problem, in this work we introduce a novel inductive bias towards simple policies in reinforcement learning. The simplicity inductive bias is introduced by minimizing the entropy of entire action trajectories, corresponding to the number of bits required to describe information in action trajectories after the agent observes state trajectories. Our reinforcement learning agent, Trajectory Entropy Reinforcement Learning, is optimized to minimize the trajectory entropy while maximizing rewards. We show that the trajectory entropy can be effectively estimated by learning a variational parameterized action prediction model, and use the prediction model to construct an information-regularized reward function. Furthermore, we construct a practical algorithm that enables the joint optimization of models, including the policy and the prediction model. Experimental evaluations on several high-dimensional locomotion tasks show that our learned policies produce more cyclical and consistent action trajectories, and achieve superior performance, and robustness to noise and dynamic changes than the state-of-the-art.
comment: 10 pages
Beyond Task Performance: Human Experience in Human-Robot Collaboration
Human interaction experience plays a crucial role in the effectiveness of human-machine collaboration, especially as interactions in future systems progress towards tighter physical and functional integration. While automation design has been shown to impact task performance, its influence on human experience metrics such as flow, sense of agency (SoA), and embodiment remains underexplored. This study investigates how variations in automation design affect these psychological experience measures and examines correlations between subjective experience and physiological indicators. A user study was conducted in a simulated wood workshop, where participants collaborated with a lightweight robot under four automation levels. The results of the study indicate that medium automation levels enhance flow, SoA and embodiment, striking a balance between support and user autonomy. In contrast, higher automation, despite optimizing task performance, diminishes perceived flow and agency. Furthermore, we observed that grip force might be considered as a real-time proxy of SoA, while correlations with heart rate variability were inconclusive. The findings underscore the necessity for automation strategies that integrate human- centric metrics, aiming to optimize both performance and user experience in collaborative robotic systems
SCU-Hand: Soft Conical Universal Robotic Hand for Scooping Granular Media from Containers of Various Sizes ICRA2025
Automating small-scale experiments in materials science presents challenges due to the heterogeneous nature of experimental setups. This study introduces the SCU-Hand (Soft Conical Universal Robot Hand), a novel end-effector designed to automate the task of scooping powdered samples from various container sizes using a robotic arm. The SCU-Hand employs a flexible, conical structure that adapts to different container geometries through deformation, maintaining consistent contact without complex force sensing or machine learning-based control methods. Its reconfigurable mechanism allows for size adjustment, enabling efficient scooping from diverse container types. By combining soft robotics principles with a sheet-morphing design, our end-effector achieves high flexibility while retaining the necessary stiffness for effective powder manipulation. We detail the design principles, fabrication process, and experimental validation of the SCU-Hand. Experimental validation showed that the scooping capacity is about 20% higher than that of a commercial tool, with a scooping performance of more than 95% for containers of sizes between 67 mm to 110 mm. This research contributes to laboratory automation by offering a cost-effective, easily implementable solution for automating tasks such as materials synthesis and characterization processes.
comment: 2025 IEEE International Conference on Robotics and Automation (ICRA2025). Preprint. Accepted January 2025
NAMO-LLM: Efficient Navigation Among Movable Obstacles with Large Language Model Guidance
Several planners have been proposed to compute robot paths that reach desired goal regions while avoiding obstacles. However, these methods fail when all pathways to the goal are blocked. In such cases, the robot must reason about how to reconfigure the environment to access task-relevant regions - a problem known as Navigation Among Movable Objects (NAMO). While various solutions to this problem have been developed, they often struggle to scale to highly cluttered environments. To address this, we propose NAMO-LLM, a sampling-based planner that searches over robot and obstacle configurations to compute feasible plans specifying which obstacles to move, where, and in what order. Its key novelty is a non-uniform sampling strategy guided by Large Language Models (LLMs) biasing the tree construction toward directions more likely to yield a solution. We show that NAMO-LLM is probabilistically complete and demonstrate through experiments that it efficiently scales to cluttered environments, outperforming related works in both runtime and plan quality.
comment: 10 pages, 6 figures
Scalable Aerial GNSS Localization for Marine Robots
Accurate localization is crucial for water robotics, yet traditional onboard Global Navigation Satellite System (GNSS) approaches are difficult or ineffective due to signal reflection on the water's surface and its high cost of aquatic GNSS receivers. Existing approaches, such as inertial navigation, Doppler Velocity Loggers (DVL), SLAM, and acoustic-based methods, face challenges like error accumulation and high computational complexity. Therefore, a more efficient and scalable solution remains necessary. This paper proposes an alternative approach that leverages an aerial drone equipped with GNSS localization to track and localize a marine robot once it is near the surface of the water. Our results show that this novel adaptation enables accurate single and multi-robot marine robot localization.
comment: International Conference on Robotics and Automation 2025 Workshop Robots in the Wild
Steerable Scene Generation with Post Training and Inference-Time Search
Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic manipulation, and adapt it to task-specific goals. We do this by training a unified diffusion-based generative model that predicts which objects to place from a fixed asset library, along with their SE(3) poses. This model serves as a flexible scene prior that can be adapted using reinforcement learning-based post training, conditional generation, or inference-time search, steering generation toward downstream objectives even when they differ from the original data distribution. Our method enables goal-directed scene synthesis that respects physical feasibility and scales across scene types. We introduce a novel MCTS-based inference-time search strategy for diffusion models, enforce feasibility via projection and simulation, and release a dataset of over 44 million SE(3) scenes spanning five diverse environments. Website with videos, code, data, and model weights: https://steerable-scene-generation.github.io/
comment: Project website: https://steerable-scene-generation.github.io/
Data-Dependent Hidden Markov Model with Off-Road State Determination and Real-Time Viterbi Algorithm for Lane Determination in Autonomous Vehicles
Lane determination and lane sequence determination are important components for many Connected and Automated Vehicle (CAV) applications. Lane determination has been solved using Hidden Markov Model (HMM) among other methods. The existing HMM literature for lane sequence determination uses empirical definitions with user-modified parameters to calculate HMM probabilities. The probability definitions in the literature can cause breaks in the HMM due to the inability to directly calculate probabilities of off-road positions, requiring post-processing of data. This paper develops a time-varying HMM using the physical properties of the roadway and vehicle, and the stochastic properties of the sensors. This approach yields emission and transition probability models conditioned on the sensor data without parameter tuning. It also accounts for the probability that the vehicle is not in any roadway lane (e.g., on the shoulder or making a U-turn), which eliminates the need for post-processing to deal with breaks in the HMM processing. This approach requires adapting the Viterbi algorithm and the HMM to be conditioned on the sensor data, which are then used to generate the most-likely sequence of lanes the vehicle has traveled. The proposed approach achieves an average accuracy of 95.9%. Compared to the existing literature, this provides an average increase of 2.25% by implementing the proposed transition probability and an average increase of 5.1% by implementing both the proposed transition and emission probabilities.
comment: 15 pages, 5 figures, 5 tables, Journal
Geometric Fault-Tolerant Neural Network Tracking Control of Unknown Systems on Matrix Lie Groups
We present a geometric neural network-based tracking controller for systems evolving on matrix Lie groups under unknown dynamics, actuator faults, and bounded disturbances. Leveraging the left-invariance of the tangent bundle of matrix Lie groups, viewed as an embedded submanifold of the vector space $\R^{N\times N}$, we propose a set of learning rules for neural network weights that are intrinsically compatible with the Lie group structure and do not require explicit parameterization. Exploiting the geometric properties of Lie groups, this approach circumvents parameterization singularities and enables a global search for optimal weights. The ultimate boundedness of all error signals -- including the neural network weights, the coordinate-free configuration error function, and the tracking velocity error -- is established using Lyapunov's direct method. To validate the effectiveness of the proposed method, we provide illustrative simulation results for decentralized formation control of multi-agent systems on the Special Euclidean group.
Fitts' List Revisited: An Empirical Study on Function Allocation in a Two-Agent Physical Human-Robot Collaborative Position/Force Task
In this letter, we investigate whether the classical function allocation holds for physical Human-Robot Collaboration, which is important for providing insights for Industry 5.0 to guide how to best augment rather than replace workers. This study empirically tests the applicability of Fitts' List within physical Human-Robot Collaboration, by conducting a user study (N=26, within-subject design) to evaluate four distinct allocations of position/force control between human and robot in an abstract blending task. We hypothesize that the function in which humans control the position achieves better performance and receives higher user ratings. When allocating position control to the human and force control to the robot, compared to the opposite case, we observed a significant improvement in preventing overblending. This was also perceived better in terms of physical demand and overall system acceptance, while participants experienced greater autonomy, more engagement and less frustration. An interesting insight was that the supervisory role (when the robot controls both position and force control) was rated second best in terms of subjective acceptance. Another surprising insight was that if position control was delegated to the robot, the participants perceived much lower autonomy than when the force control was delegated to the robot. These findings empirically support applying Fitts' principles to static function allocation for physical collaboration, while also revealing important nuanced user experience trade-offs, particularly regarding perceived autonomy when delegating position control.
comment: 10 pages, 6 figures, under review for publication in IEEE Robotics and Automation Letters (RA-L)
Efficient Sensorimotor Learning for Open-world Robot Manipulation
This dissertation considers Open-world Robot Manipulation, a manipulation problem where a robot must generalize or quickly adapt to new objects, scenes, or tasks for which it has not been pre-programmed or pre-trained. This dissertation tackles the problem using a methodology of efficient sensorimotor learning. The key to enabling efficient sensorimotor learning lies in leveraging regular patterns that exist in limited amounts of demonstration data. These patterns, referred to as ``regularity,'' enable the data-efficient learning of generalizable manipulation skills. This dissertation offers a new perspective on formulating manipulation problems through the lens of regularity. Building upon this notion, we introduce three major contributions. First, we introduce methods that endow robots with object-centric priors, allowing them to learn generalizable, closed-loop sensorimotor policies from a small number of teleoperation demonstrations. Second, we introduce methods that constitute robots' spatial understanding, unlocking their ability to imitate manipulation skills from in-the-wild video observations. Last but not least, we introduce methods that enable robots to identify reusable skills from their past experiences, resulting in systems that can continually imitate multiple tasks in a sequential manner. Altogether, the contributions of this dissertation help lay the groundwork for building general-purpose personal robots that can quickly adapt to new situations or tasks with low-cost data collection and interact easily with humans. By enabling robots to learn and generalize from limited data, this dissertation takes a step toward realizing the vision of intelligent robotic assistants that can be seamlessly integrated into everyday scenarios.
comment: Ph.D. Dissertation
Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions
Functional grasp is essential for enabling dexterous multi-finger robot hands to manipulate objects effectively. However, most prior work either focuses on power grasping, which simply involves holding an object still, or relies on costly teleoperated robot demonstrations to teach robots how to grasp each object functionally. Instead, we propose extracting human grasp information from web images since they depict natural and functional object interactions, thereby bypassing the need for curated demonstrations. We reconstruct human hand-object interaction (HOI) 3D meshes from RGB images, retarget the human hand to multi-finger robot hands, and align the noisy object mesh with its accurate 3D shape. We show that these relatively low-quality HOI data from inexpensive web sources can effectively train a functional grasping model. To further expand the grasp dataset for seen and unseen objects, we use the initially-trained grasping policy with web data in the IsaacGym simulator to generate physically feasible grasps while preserving functionality. We train the grasping model on 10 object categories and evaluate it on 9 unseen objects, including challenging items such as syringes, pens, spray bottles, and tongs, which are underrepresented in existing datasets. The model trained on the web HOI dataset, achieving a 75.8% success rate on seen objects and 61.8% across all objects in simulation, with a 6.7% improvement in success rate and a 1.8x increase in functionality ratings over baselines. Simulator-augmented data further boosts performance from 61.8% to 83.4%. The sim-to-real transfer to the LEAP Hand achieves a 85% success rate. Project website is at: https://webgrasp.github.io/.
Occupancy World Model for Robots
Understanding and forecasting the scene evolutions deeply affect the exploration and decision of embodied agents. While traditional methods simulate scene evolutions through trajectory prediction of potential instances, current works use the occupancy world model as a generative framework for describing fine-grained overall scene dynamics. However, existing methods cluster on the outdoor structured road scenes, while ignoring the exploration of forecasting 3D occupancy scene evolutions for robots in indoor scenes. In this work, we explore a new framework for learning the scene evolutions of observed fine-grained occupancy and propose an occupancy world model based on the combined spatio-temporal receptive field and guided autoregressive transformer to forecast the scene evolutions, called RoboOccWorld. We propose the Conditional Causal State Attention (CCSA), which utilizes camera poses of next state as conditions to guide the autoregressive transformer to adapt and understand the indoor robotics scenarios. In order to effectively exploit the spatio-temporal cues from historical observations, Hybrid Spatio-Temporal Aggregation (HSTA) is proposed to obtain the combined spatio-temporal receptive field based on multi-scale spatio-temporal windows. In addition, we restructure the OccWorld-ScanNet benchmark based on local annotations to facilitate the evaluation of the indoor 3D occupancy scene evolution prediction task. Experimental results demonstrate that our RoboOccWorld outperforms state-of-the-art methods in indoor 3D occupancy scene evolution prediction task. The code will be released soon.
VIMPPI: Enhancing Model Predictive Path Integral Control with Variational Integration for Underactuated Systems
This paper presents VIMPPI, a novel control approach for underactuated double pendulum systems developed for the AI Olympics competition. We enhance the Model Predictive Path Integral framework by incorporating variational integration techniques, enabling longer planning horizons without additional computational cost. Operating at 500-700 Hz with control interpolation and disturbance detection mechanisms, VIMPPI substantially outperforms both baseline methods and alternative MPPI implementations
Visual Imitation Enables Contextual Humanoid Control
How can we teach humanoids to climb staircases and sit on chairs using the surrounding environment context? Arguably, the simplest way is to just show them-casually capture a human motion video and feed it to humanoids. We introduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everyday videos, jointly reconstructs the humans and the environment, and produces whole-body control policies for humanoid robots that perform the corresponding skills. We demonstrate the results of our pipeline on real humanoid robots, showing robust, repeatable contextual control such as staircase ascents and descents, sitting and standing from chairs and benches, as well as other dynamic whole-body skills-all from a single policy, conditioned on the environment and global root commands. VIDEOMIMIC offers a scalable path towards teaching humanoids to operate in diverse real-world environments.
comment: Project website: https://www.videomimic.net/
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Operating robots in open-ended scenarios with diverse tasks is a crucial research and application direction in robotics. While recent progress in natural language processing and large multimodal models has enhanced robots' ability to understand complex instructions, robot manipulation still faces the procedural skill dilemma and the declarative skill dilemma in open environments. Existing methods often compromise cognitive and executive capabilities. To address these challenges, in this paper, we propose RoBridge, a hierarchical intelligent architecture for general robotic manipulation. It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM), an invariant operable representation (IOR) serving as a symbolic bridge, and a generalist embodied agent (GEA). RoBridge maintains the declarative skill of VLM and unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution. RoBridge demonstrates significant performance improvements over existing baselines, achieving a 75% success rate on new tasks and an 83% average success rate in sim-to-real generalization using only five real-world data samples per task. This work represents a significant step towards integrating cognitive reasoning with physical execution in robotic systems, offering a new paradigm for general robotic manipulation.
comment: project page: https://abliao.github.io/RoBridge/
Graph-based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. Our approach builds upon the conventional Hybrid A star algorithm by introducing a time-indexed variant that explicitly accounts for the predictions of dynamic obstacles during node exploration in the graph, thus enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A star algorithm within an online planning framework to compute local paths at each planning step, guided by an adaptively chosen intermediate goal. The proposed method is validated in diverse parking scenarios, including perpendicular, angled, and parallel parking. Through simulations, we showcase our approach's potential in greatly improving the efficiency and safety when compared to the state of the art spline-based planning method for parking situations.
comment: IEEE Intelligent Vehicles Symposium 2025
Multi-Robot Motion Planning with Diffusion Models ICLR 2025
Diffusion models have recently been successfully applied to a wide range of robotics applications for learning complex multi-modal behaviors from data. However, prior works have mostly been confined to single-robot and small-scale environments due to the high sample complexity of learning multi-robot diffusion models. In this paper, we propose a method for generating collision-free multi-robot trajectories that conform to underlying data distributions while using only single-robot data. Our algorithm, Multi-robot Multi-model planning Diffusion (MMD), does so by combining learned diffusion models with classical search-based techniques -- generating data-driven motions under collision constraints. Scaling further, we show how to compose multiple diffusion models to plan in large environments where a single diffusion model fails to generalize well. We demonstrate the effectiveness of our approach in planning for dozens of robots in a variety of simulated scenarios motivated by logistics environments. View video demonstrations and code at: https://multi-robot-diffusion.github.io/.
comment: The first three authors contributed equally to this work. Published at ICLR 2025
Evaluation Framework for Sensor Configuration Impact on Deep Learning-Based Perception
Current research on automotive perception systems predominantly focusses on either improving the performance of sensor technology or enhancing the perception functions in isolation. High-level perception functions are increasingly based on deep learning (DL) models due to their improved performance and generalisability compared to traditional algorithms. Despite the vital need to evaluate the performance of DL-based perception functions under real-world conditions using onboard sensor inputs, there is a lack of frameworks to implement such systematic evaluations. This paper presents a versatile framework to evaluate the impact of perception sensor modalities and parameter settings on DL-based perception functions. Using a simulation environment, the framework facilitates sensor modality selection and parameter tuning under different operational design domain conditions. Its effectiveness is demonstrated through a case study involving a state-of-the-art surround trajectory prediction model, highlighting performance differences across the sensor modalities radar and camera. Different settings for the parameter, horizontal field of view (HFOV) were evaluated to identify the optimal configuration. The results indicate that a radar sensor with a narrow HFOV is the most suitable configuration for the evaluated perception algorithm. The proposed framework offers a holistic approach to the design of the perception sensor suite, significantly contributing to the development of robust perception systems for automated driving systems.
comment: 11 pages, 10 figures
MonoForce: Learnable Image-conditioned Physics Engine
We propose a novel model for the prediction of robot trajectories on rough offroad terrain from the onboard camera images. This model enforces the laws of classical mechanics through a physics-aware neural symbolic layer while preserving the ability to learn from large-scale data as it is end-to-end differentiable. The proposed hybrid model integrates a black-box component that predicts robot-terrain interaction forces with a neural-symbolic layer. This layer includes a differentiable physics engine that computes the robot's trajectory by querying these forces at the points of contact with the terrain. As the proposed architecture comprises substantial geometrical and physics priors, the resulting model can also be seen as a learnable physics engine conditioned on real images that delivers $10^4$ trajectories per second. We argue and empirically demonstrate that this architecture reduces the sim-to-real gap and mitigates out-of-distribution sensitivity. The differentiability, in conjunction with the rapid simulation speed, makes the model well-suited for various applications including model predictive control, trajectory shooting, supervised and reinforcement learning or SLAM. The codes and data are publicly available.
comment: Code: https://github.com/ctu-vras/monoforce
LRFusionPR: A Polar BEV-Based LiDAR-Radar Fusion Network for Place Recognition
In autonomous driving, place recognition is critical for global localization in GPS-denied environments. LiDAR and radar-based place recognition methods have garnered increasing attention, as LiDAR provides precise ranging, whereas radar excels in adverse weather resilience. However, effectively leveraging LiDAR-radar fusion for place recognition remains challenging. The noisy and sparse nature of radar data limits its potential to further improve recognition accuracy. In addition, heterogeneous radar configurations complicate the development of unified cross-modality fusion frameworks. In this paper, we propose LRFusionPR, which improves recognition accuracy and robustness by fusing LiDAR with either single-chip or scanning radar. Technically, a dual-branch network is proposed to fuse different modalities within the unified polar coordinate bird's eye view (BEV) representation. In the fusion branch, cross-attention is utilized to perform cross-modality feature interactions. The knowledge from the fusion branch is simultaneously transferred to the distillation branch, which takes radar as its only input to further improve the robustness. Ultimately, the descriptors from both branches are concatenated, producing the multimodal global descriptor for place retrieval. Extensive evaluations on multiple datasets demonstrate that our LRFusionPR achieves accurate place recognition, while maintaining robustness under varying weather conditions. Our open-source code will be released at https://github.com/QiZS-BIT/LRFusionPR.
comment: 8 pages, 6 figures
$\texttt{SPIN}$: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation
Current robots struggle with long-horizon manipulation tasks requiring sequences of prehensile and non-prehensile skills, contact-rich interactions, and long-term reasoning. We present $\texttt{SPIN}$ ($\textbf{S}$kill $\textbf{P}$lanning to $\textbf{IN}$ference), a framework that distills a computationally intensive planning algorithm into a policy via imitation learning. We propose $\texttt{Skill-RRT}$, an extension of RRT that incorporates skill applicability checks and intermediate object pose sampling for solving such long-horizon problems. To chain independently trained skills, we introduce $\textit{connectors}$, goal-conditioned policies trained to minimize object disturbance during transitions. High-quality demonstrations are generated with $\texttt{Skill-RRT}$ and distilled through noise-based replay in order to reduce online computation time. The resulting policy, trained entirely in simulation, transfers zero-shot to the real world and achieves over 80% success across three challenging long-horizon manipulation tasks and outperforms state-of-the-art hierarchical RL and planning methods.
comment: Project website: https://sites.google.com/view/skill-rrt
Bimanual Regrasp Planning and Control for Active Reduction of Object Pose Uncertainty
Precisely grasping an object is a challenging task due to pose uncertainties. Conventional methods have used cameras and fixtures to reduce object uncertainty. They are effective but require intensive preparation, such as designing jigs based on the object geometry and calibrating cameras with high-precision tools fabricated using lasers. In this study, we propose a method to reduce the uncertainty of the position and orientation of a grasped object without using a fixture or a camera. Our method is based on the concept that the flat finger pads of a parallel gripper can reduce uncertainty along its opening/closing direction through flat surface contact. Three orthogonal grasps by parallel grippers with flat finger pads collectively constrain an object's position and orientation to a unique state. Guided by the concepts, we develop a regrasp planning and admittance control approach that sequentially finds and leverages three orthogonal grasps of two robotic arms to actively reduce uncertainties in the object pose. We evaluated the proposed method on different initial object uncertainties and verified that it had good repeatability. The deviation levels of the experimental trials were on the same order of magnitude as those of an optical tracking system, demonstrating strong relative inference performance.
Opening Articulated Structures in the Real World
What does it take to build mobile manipulation systems that can competently operate on previously unseen objects in previously unseen environments? This work answers this question using opening of articulated structures as a mobile manipulation testbed. Specifically, our focus is on the end-to-end performance on this task without any privileged information, i.e. the robot starts at a location with the novel target articulated object in view, and has to approach the object and successfully open it. We first develop a system for this task, and then conduct 100+ end-to-end system tests across 13 real world test sites. Our large-scale study reveals a number of surprising findings: a) modular systems outperform end-to-end learned systems for this task, even when the end-to-end learned systems are trained on 1000+ demonstrations, b) perception, and not precise end-effector control, is the primary bottleneck to task success, and c) state-of-the-art articulation parameter estimation models developed in isolation struggle when faced with robot-centric viewpoints. Overall, our findings highlight the limitations of developing components of the pipeline in isolation and underscore the need for system-level research, providing a pragmatic roadmap for building generalizable mobile manipulation systems. Videos, code, and models are available on the project website: https://arjung128.github.io/opening-articulated-structures/
comment: Accepted to RSS 2025. Project webpage: https://arjung128.github.io/opening-articulated-structures/
PNE-SGAN: Probabilistic NDT-Enhanced Semantic Graph Attention Network for LiDAR Loop Closure Detection
LiDAR loop closure detection (LCD) is crucial for consistent Simultaneous Localization and Mapping (SLAM) but faces challenges in robustness and accuracy. Existing methods, including semantic graph approaches, often suffer from coarse geometric representations and lack temporal robustness against noise, dynamics, and viewpoint changes. We introduce PNE-SGAN, a Probabilistic NDT-Enhanced Semantic Graph Attention Network, to overcome these limitations. PNE-SGAN enhances semantic graphs by using Normal Distributions Transform (NDT) covariance matrices as rich, discriminative geometric node features, processed via a Graph Attention Network (GAT). Crucially, it integrates graph similarity scores into a probabilistic temporal filtering framework (modeled as an HMM/Bayes filter), incorporating uncertain odometry for motion modeling and utilizing forward-backward smoothing to effectively handle ambiguities. Evaluations on challenging KITTI sequences (00 and 08) demonstrate state-of-the-art performance, achieving Average Precision of 96.2\% and 95.1\%, respectively. PNE-SGAN significantly outperforms existing methods, particularly in difficult bidirectional loop scenarios where others falter. By synergizing detailed NDT geometry with principled probabilistic temporal reasoning, PNE-SGAN offers a highly accurate and robust solution for LiDAR LCD, enhancing SLAM reliability in complex, large-scale environments.
comment: We discovered a critical implementation bug in Section 4 (probabilistic NDT-based semantic graph attention module) that invalidates the results shown in Figures 3-4
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
Efficient path planning in robotics, particularly within large-scale, dynamic environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability in dynamic scenarios hinder real-time deployment on edge devices. We present SmallPlan -- a novel framework leveraging LLMs as teacher models to train lightweight Small Language Models (SLMs) for high-level path planning tasks. In SmallPlan, the SLMs provide optimal action sequences to navigate across scene graphs that compactly represent full-scaled 3D scenes. The SLMs are trained in a simulation-powered, interleaved manner with LLM-guided supervised fine-tuning (SFT) and reinforcement learning (RL). This strategy not only enables SLMs to successfully complete navigation tasks but also makes them aware of important factors like travel distance and number of trials. Through experiments, we demonstrate that the fine-tuned SLMs perform competitively with larger models like GPT-4o on sequential path planning, without suffering from hallucination and overfitting. SmallPlan is resource-efficient, making it well-suited for edge-device deployment and advancing practical autonomous robotics.
comment: Paper is under review
FrontierNet: Learning Visual Cues to Explore
Exploration of unknown environments is crucial for autonomous robots; it allows them to actively reason and decide on what new data to acquire for different tasks, such as mapping, object discovery, and environmental assessment. Existing solutions, such as frontier-based exploration approaches, rely heavily on 3D map operations, which are limited by map quality and, more critically, often overlook valuable context from visual cues. This work aims at leveraging 2D visual cues for efficient autonomous exploration, addressing the limitations of extracting goal poses from a 3D map. We propose a visual-only frontier-based exploration system, with FrontierNet as its core component. FrontierNet is a learning-based model that (i) proposes frontiers, and (ii) predicts their information gain, from posed RGB images enhanced by monocular depth priors. Our approach provides an alternative to existing 3D-dependent goal-extraction approaches, achieving a 15\% improvement in early-stage exploration efficiency, as validated through extensive simulations and real-world experiments. The project is available at https://github.com/cvg/FrontierNet.
Zero-shot Object-Centric Instruction Following: Integrating Foundation Models with Traditional Navigation
Large scale scenes such as multifloor homes can be robustly and efficiently mapped with a 3D graph of landmarks estimated jointly with robot poses in a factor graph, a technique commonly used in commercial robots such as drones and robot vacuums. In this work, we propose Language-Inferred Factor Graph for Instruction Following (LIFGIF), a zero-shot method to ground natural language instructions in such a map. LIFGIF also includes a policy for following natural language navigation instructions in a novel environment while the map is constructed, enabling robust navigation performance in the physical world. To evaluate LIFGIF, we present a new dataset, Object-Centric VLN (OC-VLN), in order to evaluate grounding of object-centric natural language navigation instructions. We compare to two state-of-the-art zero-shot baselines from related tasks, Object Goal Navigation and Vision Language Navigation, to demonstrate that LIFGIF outperforms them across all our evaluation metrics on OCVLN. Finally, we successfully demonstrate the effectiveness of LIFGIF for performing zero-shot object-centric instruction following in the real world on a Boston Dynamics Spot robot.
Extending the Benefits of Parallel Elasticity across Multiple Actuation Tasks: A Geometric and Optimization-Based Approach
A spring in parallel with an effort source (e.g., electric motor or human muscle) can reduce its energy consumption and effort (i.e., torque or force) depending on the spring stiffness, spring preload, and actuation task. However, selecting the spring stiffness and preload that guarantees effort or energy reduction for an arbitrary set of tasks is a design challenge. This work formulates a convex optimization problem to guarantee that a parallel spring reduces the root-mean-square source effort or energy consumption for multiple tasks. Specifically, we guarantee the benefits across multiple tasks by enforcing a set of convex quadratic constraints in our optimization variables, the parallel spring stiffness and preload. These quadratic constraints are equivalent to ellipses in the stiffness and preload plane; any combination of stiffness and preload inside the ellipse represents a parallel spring that minimizes effort source or energy consumption with respect to an actuator without a spring. This geometric interpretation intuitively guides the stiffness and preload selection process. We analytically and experimentally prove the convex quadratic function of the spring stiffness and preload. As applications, we analyze the stiffness and preload selection of a parallel spring for a knee exoskeleton using human muscle as the effort source and a prosthetic ankle powered by electric motors. The source code associated with our framework is available as supplemental open-source software.
Simplification of Robotic System Model Analysis by Petri Net Meta-Model Property Transfer
This paper presents a simplification of robotic system model analysis due to the transfer of Robotic System Hierarchical Petri Net (RSHPN) meta-model properties onto the model of a designed system. Key contributions include: 1) analysis of RSHPN meta-model properties; 2) decomposition of RSHPN analysis into analysis of individual Petri nets, thus the reduction of state space explosion; and 3) transfer of RSHPN meta-model properties onto the produced models, hence elimination of the need for full re-analysis of the RSHPN model when creating new robotic systems. Only task-dependent parts of the model need to be analysed. This approach streamlines the analysis thus reducing the design time. Moreover, it produces a specification which is a solid foundation for the implementation of the system. The obtained results highlight the potential of Petri nets as a valuable formal framework for analysing robotic system properties.
comment: 16 pages
Prismatic-Bending Transformable (PBT) Joint for a Modular, Foldable Manipulator with Enhanced Reachability and Dexterity
Robotic manipulators, traditionally designed with classical joint-link articulated structures, excel in industrial applications but face challenges in human-centered and general-purpose tasks requiring greater dexterity and adaptability. To address these challenges, we propose the Prismatic-Bending Transformable (PBT) Joint, a novel, scissors-inspired mechanism with directional maintenance capability that provides bending, rotation, and elongation/contraction within a single module. This design enables transformable kinematic chains that are modular, reconfigurable, and scalable for diverse tasks. We detail the mechanical design, optimization, kinematic and dynamic modeling, and experimental validation of the PBT joint, demonstrating its integration into foldable, modular robotic manipulators. The PBT joint functions as a single stock keeping unit (SKU), enabling manipulators to be constructed entirely from standardized PBT joints. It also serves as a modular extension for existing systems, such as wrist modules, streamlining design, deployment, transportation, and maintenance. Three joint sizes have been developed and tested, showcasing enhanced dexterity, reachability, and adaptability, particularly in confined and cluttered spaces. This work presents a promising approach to robotic manipulator development, providing a compact and versatile solution for operation in dynamic and constrained environments.
Global-Local Interface with Selective Direct and Singularity-Avoiding Motion Mapping for Intuitive Teleoperation
This paper presents the Global-Local Teleoperation Interface, a hierarchical framework that enhances human-robot interaction by decoupling large-scale manipulator positioning from fine end-effector manipulation. The global component enables efficient workspace traversal, while the local component facilitates precise and dexterous control. To address slave-side kinematic singularities-especially during fine manipulation, we propose a singularity-avoiding motion mapping strategy that enhances both stability and intuitiveness. We further introduce the concept of an operational Jacobian to characterize the smoothness of joint motion under local control. The G-L interface is implemented in two variants: Direct Mapping and Singularity-Avoiding Mapping, and is validated through hardware experiments involving precision tasks and complex motion. Results show substantial improvements in task success rate, efficiency, and user experience over conventional global or local-only systems.
Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning
Agents trained via reinforcement learning (RL) often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmentations to simulate plausible out-of-distribution scenarios and incorporates memory mechanisms to enable context-aware policy adaptation. Trained on a predefined set of tasks, our policy demonstrates the ability to generalize to unseen tasks through memory augmentation without requiring additional interactions with the environment. Through extensive simulation experiments and real-world hardware evaluations on legged locomotion tasks, we demonstrate that our approach achieves zero-shot generalization to unseen tasks while maintaining robust in-distribution performance and high sample efficiency.
Systems and Control (CS)
Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems
This paper presents a novel approach for optimizing the scheduling and control of Pan-Tilt-Zoom (PTZ) cameras in dynamic surveillance environments. The proposed method integrates Kalman filters for motion prediction with a dynamic network flow model to enhance real-time video capture efficiency. By assigning Kalman filters to tracked objects, the system predicts future locations, enabling precise scheduling of camera tasks. This prediction-driven approach is formulated as a network flow optimization, ensuring scalability and adaptability to various surveillance scenarios. To further reduce redundant monitoring, we also incorporate group-tracking nodes, allowing multiple objects to be captured within a single camera focus when appropriate. In addition, a value-based system is introduced to prioritize camera actions, focusing on the timely capture of critical events. By adjusting the decay rates of these values over time, the system ensures prompt responses to tasks with imminent deadlines. Extensive simulations demonstrate that this approach improves coverage, reduces average wait times, and minimizes missed events compared to traditional master-slave camera systems. Overall, our method significantly enhances the efficiency, scalability, and effectiveness of surveillance systems, particularly in dynamic and crowded environments.
comment: 7 pages, 3 Figures, Accepted at AIRC 2025
Consensus Seminorms and their Applications
Consensus is a well-studied problem in distributed sensing, computation and control, yet deriving useful and easily computable bounds on the rate of convergence to consensus remains a challenge. We study the applications of seminorms for this goal. We revisit a previously suggested family of seminorms and correct an error made in their original presentation where it was claimed that the a certain seminorm is equal to the well-known coefficient of ergodicity. We then propose a wider family of seminorms which guarantee convergence at an exponential rate of infinite products of matrices which generalizes known results on stochastic matrices to the class of matrices whose row sums are all equal one. Finally, we show that such seminorms cannot be used to bound the rate of convergence of classes larger than the well-known class of scrambling matrices, and pose several open questions for future research.
Integrated equilibrium model for electrified logistics and power systems
This paper proposes an integrated equilibrium model to characterize the complex interactions between electrified logistics systems and electric power delivery systems. The model consists of two major players: an electrified logistics operator (ELO) and a power system operator (PSO). The ELO aims to maximize its profit by strategically scheduling and routing its electric delivery vehicles (e-trucks) for deliveries and charging, in response to the locational marginal price (LMP) set by the PSO. The routing, delivery, and charging behaviors of e-trucks are modeled by a perturbed utility Markov decision process (PU-MDP) while their collective operations are optimized to achieve the ELO's objective by designing rewards in the PU-MDP. On the other hand, PSO optimizes the energy price by considering both the spatiotemporal e-truck charging demand and the base electricity load. The equilibrium of the integrated system is formulated as a fixed point, proved to exist under mild assumptions, and solved for a case study on the Hawaii network via Anderson's fixed-point acceleration algorithm. Along with these numerical results, this paper provides both theoretical insights and practical guidelines to achieve sustainable and efficient operations in modern electrified logistics and power systems.
Opinion Dynamics on Signed Graphs and Graphons
In this paper, we make use of graphon theory to study opinion dynamics on large undirected networks. The opinion dynamics models that we take into consideration allow for negative interactions between the individuals, whose opinions can thus grow apart. We consider both the repelling and the opposing models of negative interactions, which have been studied in the literature. We define the repelling and the opposing dynamics on signed graphons and we show that their initial value problem solutions exist and are unique. We then show that, in a suitable sense, the graphon dynamics is a good approximation of the dynamics on large graphs that converge to a graphon. This result applies to large random graphs that are sampled according to a graphon (W-random graphs), for which we provide a new convergence result under very general assumptions.
Consensus-Aware AV Behavior: Trade-offs Between Safety, Interaction, and Performance in Mixed Urban Traffic
Transportation systems have long been shaped by complexity and heterogeneity, driven by the interdependency of agent actions and traffic outcomes. The deployment of automated vehicles (AVs) in such systems introduces a new challenge: achieving consensus across safety, interaction quality, and traffic performance. In this work, we position consensus as a fundamental property of the traffic system and aim to quantify it. We use high-resolution trajectory data from the Third Generation Simulation (TGSIM) dataset to empirically analyze AV and human-driven vehicle (HDV) behavior at a signalized urban intersection and around vulnerable road users (VRUs). Key metrics, including Time-to-Collision (TTC), Post-Encroachment Time (PET), deceleration patterns, headways, and string stability, are evaluated across the three performance dimensions. Results show that full consensus across safety, interaction, and performance is rare, with only 1.63% of AV-VRU interaction frames meeting all three conditions. These findings highlight the need for AV models that explicitly balance multi-dimensional performance in mixed-traffic environments. Full reproducibility is supported via our open-source codebase on https://github.com/wissamkontar/Consensus-AV-Analysis.
comment: 7 pages, 8 figures
NN-Based Joint Mitigation of IQ Imbalance and PA Nonlinearity With Multiple States
Joint mitigation of IQ imbalance and PA nonlinearity is important for improving the performance of radio frequency (RF) transmitters. In this paper, we propose a new neural network (NN) model, which can be used for joint digital pre-distortion (DPD) of non-ideal IQ modulators and PAs in a transmitter with multiple operating states. The model is based on the methodology of multi-task learning (MTL). In this model, the hidden layers of the main NN are shared by all signal states, and the output layer's weights and biases are dynamically generated by another NN. The experimental results show that the proposed model can effectively perform joint DPD for IQ-PA systems, and it achieves better overall performance within multiple signal states than the existing methods.
Self-Calibrating Position Measurements: Applied to Imperfect Hall Sensors
Linear Hall sensors are a cost-effective alternative to optical encoders for measuring the rotor positions of actuators, with the main challenge being that they exhibit position-dependent inaccuracies resulting from manufacturing tolerances. This paper develops a data-driven calibration procedure for linear analog Hall sensors that enables accurate online estimates of the rotor angle without requiring expensive external encoders. The approach combines closed-loop data collection with nonlinear identification to obtain an accurate model of the sensor inaccuracies, which is subsequently used for online compensation. Simulation results show that when the flux density model structure is known, measurement errors are reduced to the sensor noise floor, and experiments on an industrial setup demonstrate a factor of 2.6 reduction in the root-mean-square measurement error. These results confirm that Hall sensor inaccuracies can be calibrated even when no external encoder is available, improving their practical applicability.
comment: 6 pages, 8 figures, final version, accepted for the joint 10th IFAC Symposium on Mechatronic Systems & 14th IFAC Symposium on Robotics
Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections
Unsignalized intersections pose significant safety and efficiency challenges due to complex traffic flows. This paper proposes a novel roadside unit (RSU)-centric cooperative driving system leveraging global perception and vehicle-to-infrastructure (V2I) communication. The core of the system is an RSU-based decision-making module using a two-stage hybrid reinforcement learning (RL) framework. At first, policies are pre-trained offline using conservative Q-learning (CQL) combined with behavior cloning (BC) on collected dataset. Subsequently, these policies are fine-tuned in the simulation using multi-agent proximal policy optimization (MAPPO), aligned with a self-attention mechanism to effectively solve inter-agent dependencies. RSUs perform real-time inference based on the trained models to realize vehicle control via V2I communications. Extensive experiments in CARLA environment demonstrate high effectiveness of the proposed system, by: \textit{(i)} achieving failure rates below 0.03\% in coordinating three connected and autonomous vehicles (CAVs) through complex intersection scenarios, significantly outperforming the traditional Autoware control method, and \textit{(ii)} exhibiting strong robustness across varying numbers of controlled agents and shows promising generalization capabilities on other maps.
comment: 7 pages
Beyond Task Performance: Human Experience in Human-Robot Collaboration
Human interaction experience plays a crucial role in the effectiveness of human-machine collaboration, especially as interactions in future systems progress towards tighter physical and functional integration. While automation design has been shown to impact task performance, its influence on human experience metrics such as flow, sense of agency (SoA), and embodiment remains underexplored. This study investigates how variations in automation design affect these psychological experience measures and examines correlations between subjective experience and physiological indicators. A user study was conducted in a simulated wood workshop, where participants collaborated with a lightweight robot under four automation levels. The results of the study indicate that medium automation levels enhance flow, SoA and embodiment, striking a balance between support and user autonomy. In contrast, higher automation, despite optimizing task performance, diminishes perceived flow and agency. Furthermore, we observed that grip force might be considered as a real-time proxy of SoA, while correlations with heart rate variability were inconclusive. The findings underscore the necessity for automation strategies that integrate human- centric metrics, aiming to optimize both performance and user experience in collaborative robotic systems
Impact of Grid-Forming Inverters on Protective Relays: A Perspective for Current Limiting Control Design
Grid-forming (GFM) inverters can significantly alter the fault characteristics of power systems, which challenges the proper function of protective relays. This paper gives a holistic analysis of the interaction between GFM inverter-based resources (IBRs) and the supervising elements in protective relays, including directional and phase selection elements. It is revealed that the current limiting control (CLC) that is based on the current reference saturation method, adversely affects the performance of supervising elements that rely on the negative-sequence quantities. In contrast, adopting highly inductive virtual impedance in the CLC enables a reliable operation of such elements. This finding provides insights into the design of CLC for GFM IBRs from a protection perspective. It is further found that even with a highly inductive virtual impedance, the altered virtual impedance dynamics introduced by the CLC can still lead to malfunctions of the incremental quantity-based supervising elements. These theoretical findings are corroborated by simulations and controller hardware-in-the-loop (CHIL) tests.
Energy Efficient RSMA-Based LEO Satellite Communications Assisted by UAV-Mounted BD-Active RIS: A DRL Approach
This paper proposes an advanced non-terrestrial communication architecture that integrates Rate-Splitting Multiple Access (RSMA) with a Beyond-Diagonal Active Reconfigurable Intelligent Surface (BD-ARIS) mounted on a UAV under the coverage of a Low Earth Orbit (LEO) satellite. The BD-ARIS adopts a group-connected structure to enhance signal amplification and adaptability, while RSMA enables efficient multi-user access by dividing messages into common and private components. The system jointly optimizes satellite beamforming, UAV positioning, power allocation, and rate-splitting ratios to maximize the overall energy efficiency (EE). To solve the resulting non-convex and high-dimensional problem, we employ three state-of-the-art deep reinforcement learning (DRL) algorithms: Trust Region Policy Optimization (TRPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Asynchronous Advantage Actor-Critic (A3C). Moreover, realistic models for the power consumption of both the UAV and the BD-ARIS are considered. Simulation results reveal that TRPO consistently achieves the best performance in terms of EE and sum rate, especially under high transmit powers and challenging deployment scenarios. TD3 converges faster and performs competitively in moderate settings, while A3C suffers from instability due to its high variance. Additionally, the robustness of each algorithm under channel state information (CSI) uncertainty is evaluated, confirming TRPO resilience to imperfect observations. Overall, the proposed RSMA-BD-ARIS framework significantly outperforms conventional RIS-assisted designs and provides a scalable, energy-efficient solution for 6G and massive IoT applications in non-terrestrial networks.
Scalable 49-Channel Neural Recorder with an Event-Driven Ramp ADC and PCA Compression in 28 nm CMOS
Neural interfaces advance neuroscience research and therapeutic innovations by accurately measuring neuronal activity. However, recording raw data from numerous neurons results in substantial amount of data and poses challenges for wireless transmission. While conventional neural recorders consume energy to digitize and process the full neural signal, only a fraction of this data carries essential spiking information. Leveraging on this signal sparsity, this paper introduces a neural recording integrated circuit in TSMC 28nm CMOS. It features an event-driven ramp analog-to-digital converter, and a spike compression module based on principal component analysis. The circuit consists of 49 channels, each occupying an on-chip area of 50 $\times$ 60 $\mu$m$^2$. The circuit measures 1370 $\times$ 1370 $\mu$m$^2$ and consumes 534 $\mu$W. Compression testing on a synthetic dataset demonstrated an 8.8-fold reduction compared to raw spikes and a 328-fold reduction relative to the raw signal. This compression approach maintained a spike sorting accuracy of 74.9%, compared to the 79.5% accuracy obtained with the raw signal. The paper details the architecture and performance outcomes of the neural recording circuit and its compression module.
UX-aware Rate Allocation for Real-Time Media
Immersive communications is a key use case for 6G where applications require reliable latency-bound media traffic at a certain data rate to deliver an acceptable User Experience (UX) or Quality-of-Experience (QoE). The Quality-of-Service (QoS) framework of current cellular systems (4G and 5G) and prevalent network congestion control algorithms for latency-bound traffic like L4S typically target network-related Key Performance Indicators (KPIs) such as data rates and latencies. Network capacity is based on the number of users that attain these KPIs. However, the UX of an immersive application for a given data rate and latency is not the same across users, since it depends on other factors such as the complexity of the media being transmitted and the encoder format. This implies that guarantees on network KPIs do not necessarily translate to guarantees on the UX. In this paper, we propose a framework in which the communication network can provide guarantees on the UX. The framework requires application servers to share real-time information on UX dependency on data rate to the network, which in turn, uses this information to maximize a UX-based network utility function. Our framework is motivated by the recent industry trends of increasing application awareness at the network, and pushing application servers towards the edge, allowing for tighter coordination between the servers and the 6G system. Our simulation results show that the proposed framework substantially improves the UX capacity of the network, which is the number of users above a certain UX threshold, compared to conventional rate control algorithms.
Joint User Association and Bandwidth Assignment for Digital Twin-Assisted Multi-RAT Networks
In this paper, we investigate user equipment (UE)-radio access technology (RAT) association and bandwidth assignment to maximize sum-rates in a multi-RAT network. To this end, we formulate an optimization problem that jointly addresses UE association and bandwidth allocation, adhering to practical constraints. Because of the NP-hard nature of this problem, finding a globally optimal solution is computationally infeasible. To address this challenge, we propose a centralized and computationally efficient heuristic algorithm that aims to maximize sum-rates while enhancing quality of service (QoS). Yet, the proposed approach requires global channel state information (CSI) for near-optimal performance, which incurs substantial overhead and data collection costs in large-scale multi-RAT networks. To alleviate this burden, we use a digital twin (DT) of the multi-RAT network, leveraging its context-awareness to acquire global CSI with reduced overhead. Our numerical results reveal that our approach improves sum-rates by up to 43% over baseline method, with less than a 5% deviation from the theoretical optimal solution, while achieving up to a 43% improvement in QoS. Further analysis reveals that our method not only surpasses the optimal solution in terms of QoS enhancement, but also ensures significant computational efficiency.
comment: 6 pages, 6 figures, Accepted in IEEE ICC 2025
Geometric Fault-Tolerant Neural Network Tracking Control of Unknown Systems on Matrix Lie Groups
We present a geometric neural network-based tracking controller for systems evolving on matrix Lie groups under unknown dynamics, actuator faults, and bounded disturbances. Leveraging the left-invariance of the tangent bundle of matrix Lie groups, viewed as an embedded submanifold of the vector space $\R^{N\times N}$, we propose a set of learning rules for neural network weights that are intrinsically compatible with the Lie group structure and do not require explicit parameterization. Exploiting the geometric properties of Lie groups, this approach circumvents parameterization singularities and enables a global search for optimal weights. The ultimate boundedness of all error signals -- including the neural network weights, the coordinate-free configuration error function, and the tracking velocity error -- is established using Lyapunov's direct method. To validate the effectiveness of the proposed method, we provide illustrative simulation results for decentralized formation control of multi-agent systems on the Special Euclidean group.
VIMPPI: Enhancing Model Predictive Path Integral Control with Variational Integration for Underactuated Systems
This paper presents VIMPPI, a novel control approach for underactuated double pendulum systems developed for the AI Olympics competition. We enhance the Model Predictive Path Integral framework by incorporating variational integration techniques, enabling longer planning horizons without additional computational cost. Operating at 500-700 Hz with control interpolation and disturbance detection mechanisms, VIMPPI substantially outperforms both baseline methods and alternative MPPI implementations
Toward a Harmonized Approach - Requirement-based Structuring of a Safety Assurance Argumentation for Automated Vehicles
Despite increasing testing operation on public roads, media reports on incidents show that safety issues remain to this day. One major cause factoring into this circumstance is high development uncertainty that manufacturers face when deploying these systems in an open context. In particular, one challenge is establishing a valid argument at design time that the vehicle will exhibit reasonable residual risk when operating in its intended operational design domain. Regulations, such as the European Implementing Regulation 2022/1426, require manufacturers to provide a safety assurance argumentation for SAE-Level-4 automated vehicles. While there is extensive literature on assurance cases for safety-critical systems, the domain of automated driving lacks explicit requirements regarding the creation of safety assurance argumentations. In this paper, we aim to narrow this gap by elaborating a requirement-based approach. We derive structural requirements for an argumentation from literature and supplement these with requirements derived from stakeholder concerns. We implement the requirements, yielding a proposal for an overall argumentation structure. The resulting "safety arguments" argue over four topic complexes: The developed product, the underlying process including its conformance/compliance to standards/laws, as well as the argumentations' context and soundness. Finally, we instantiate this structure with respect to domain-specific needs and principles.
comment: submitted for publication
Generating Building-Level Heat Demand Time Series by Combining Occupancy Simulations and Thermal Modeling
Despite various efforts, decarbonizing the heating sector remains a significant challenge. To tackle it by smart planning, the availability of highly resolved heating demand data is key. Several existing models provide heating demand only for specific applications. Typically, they either offer time series for a larger area or annual demand data on a building level, but not both simultaneously. Additionally, the diversity in heating demand across different buildings is often not considered. To address these limitations, this paper presents a novel method for generating temporally resolved heat demand time series at the building level using publicly available data. The approach integrates a thermal building model with stochastic occupancy simulations that account for variability in user behavior. As a result, the tool serves as a cost-effective resource for cross-sectoral energy system planning and policy development, particularly with a focus on the heating sector. The obtained data can be used to assess the impact of renovation and retrofitting strategies, or to analyze district heating expansion. To illustrate the potential applications of this approach, we conducted a case study in Puertollano (Spain), where we prepared a dataset of heating demand with hourly resolution for each of 9,298 residential buildings. This data was then used to compare two different pathways for the thermal renovation of these buildings. By relying on publicly available data, this method can be adapted and applied to various European regions, offering broad usability in energy system optimization and analysis of decarbonization strategies.
Randomized Transport Plans via Hierarchical Fully Probabilistic Design
An optimal randomized strategy for design of balanced, normalized mass transport plans is developed. It replaces -- but specializes to -- the deterministic, regularized optimal transport (OT) strategy, which yields only a certainty-equivalent plan. The incompletely specified -- and therefore uncertain -- transport plan is acknowledged to be a random process. Therefore, hierarchical fully probabilistic design (HFPD) is adopted, yielding an optimal hyperprior supported on the set of possible transport plans, and consistent with prior mean constraints on the marginals of the uncertain plan. This Bayesian resetting of the design problem for transport plans -- which we call HFPD-OT -- confers new opportunities. These include (i) a strategy for the generation of a random sample of joint transport plans; (ii) randomized marginal contracts for individual source-target pairs; and (iii) consistent measures of uncertainty in the plan and its contracts. An application in fair market matching is outlined, in which HFPD-OT enables the recruitment of a more diverse subset of contracts -- than is possible in classical OT -- into the delivery of an expected plan.
comment: 25 pages, 26 figures
Graph-based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. Our approach builds upon the conventional Hybrid A star algorithm by introducing a time-indexed variant that explicitly accounts for the predictions of dynamic obstacles during node exploration in the graph, thus enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A star algorithm within an online planning framework to compute local paths at each planning step, guided by an adaptively chosen intermediate goal. The proposed method is validated in diverse parking scenarios, including perpendicular, angled, and parallel parking. Through simulations, we showcase our approach's potential in greatly improving the efficiency and safety when compared to the state of the art spline-based planning method for parking situations.
comment: IEEE Intelligent Vehicles Symposium 2025
Measurement-Based Line-Impedance Estimation in the Absence of Phasor Measurement Units
This paper proposes and compares experimentally several methods to estimate the series resistance and reactance (i.e., the transversal components of the $\pi$-model of a line) of low-voltage lines in distribution grids. It first shows that if phasor measurements are available and the grid nodal voltages and power injections are known, the problem can be formulated and solved as a conventional load flow with properly adjusted unknowns. To solve this problem, we propose an analytical derivation of the Jacobian matrix. If only RMS values are available, such as from smart meters, integrating information from multiple intervals becomes necessary, ultimately opening to least-squares estimations, widely adopted in the literature. In this context, applying the proposed Jacobian contributes to accelerating the problem resolution of existing algorithms. The methods are compared in terms of estimation performance and convergence by using measurements from an experimental distribution grid interfacing real-world components and with realistic size implemented at the Gridlab at HES-SO Valais.
comment: 2025 IEEE Kiel PowerTech
Spectrum Sharing in Satellite-Terrestrial Integrated Networks: Frameworks, Approaches, and Opportunities
To accommodate the increasing communication needs in non-terrestrial networks (NTNs), wireless users in remote areas may require access to more spectrum than is currently allocated. Terrestrial networks (TNs), such as cellular networks, are deployed in specific areas, but many underused licensed spectrum bands remain in remote areas. Therefore, bringing NTNs to a shared spectrum with TNs can improve network capacity under reasonable interference management. However, in satellite-terrestrial integrated networks (STINs), the comprehensive coverage of a satellite and the unbalanced communication resources of STINs make it challenging to effectively manage mutual interference between NTN and TN. This article presents the fundamentals and prospects of spectrum sharing (SS) in STINs by introducing four SS frameworks, their potential application scenarios, and technical challenges. Furthermore, advanced SS approaches related to interference management in STINs and performance metrics of SS in STINs are introduced. Moreover, a preliminary performance evaluation showcases the potential for sharing the spectrum between NTN and TN. Finally, future research opportunities for SS in STINs are discussed.
Set-based and Dynamical Feedback-augmented Hands-off Control
A novel set-theoretical approach to hands-off control is proposed, which focuses on spatial arguments for command limitation, rather than temporal ones. By employing dynamical feedback alongside invariant set-based constraints, actuation is employed only to drive the system's state inside a "hands-off region" of its state-space, where the plant may freely evolve in open-loop configuration. A computationally-efficient procedure with strong theoretical guarantees is devised, and its effectiveness is showcased via an intuitive practical example.
comment: 10 pages, 4 figures
Design of a compact low loss 2-way millimetre wave power divider for future communication
In this paper, a rectangular-shaped power divider has been presented operating at 27.9 GHz. The power divider has achieved acceptable results for important parameters such as S11, S12, S21, and S22. The substrate employed for the power divider is Roger 3003 which has a thickness of 1.6 mm. This power divider provides a reflection coefficient of -12.2 dB and an insertion loss of 3.1 dB at 28 GHz. This ka-band T-junction power divider covers 68% of the bandwidth. Dimensions of the ka-band T-junction power divider are 50x80 mm. Due to its dimensions and bandwidth this power divider is more suitable for millimetre wave applications like RADAR, beamforming, and 5G applications.
comment: 7 pages, 6 figures, 2 tables
Toward a Harmonized Approach -- Requirement-based Structuring of a Safety Assurance Argumentation for Automated Vehicles
Despite increasing testing operation on public roads, media reports on incidents show that safety issues remain to this day. One major cause factoring into this circumstance is high development uncertainty that manufacturers face when deploying these systems in an open context. In particular, one challenge is establishing a valid argument at design time that the vehicle will exhibit reasonable residual risk when operating in its intended operational design domain. Regulations, such as the European Implementing Regulation 2022/1426, require manufacturers to provide a safety assurance argumentation for SAE-Level-4 automated vehicles. While there is extensive literature on assurance cases for safety-critical systems, the domain of automated driving lacks explicit requirements regarding the creation of safety assurance argumentations. In this paper, we aim to narrow this gap by elaborating a requirement-based approach. We derive structural requirements for an argumentation from literature and supplement these with requirements derived from stakeholder concerns. We implement the requirements, yielding a proposal for an overall argumentation structure. The resulting "safety arguments" argue over four topic complexes: The developed product, the underlying process including its conformance/compliance to standards/laws, as well as the argumentations' context and soundness. Finally, we instantiate this structure with respect to domain-specific needs and principles.
comment: submitted for publication
Disturbance-adaptive Model Predictive Control for Bounded Average Constraint Violations
This paper considers stochastic linear time-invariant systems subject to constraints on the average number of state-constraint violations over time without knowing the disturbance distribution. We present a novel disturbance-adaptive model predictive control (DAD-MPC) framework, which adjusts the disturbance model based on measured constraint violations. Using a robust invariance method, DAD-MPC ensures recursive feasibility and guarantees asymptotic or robust bounds on average constraint violations. Additionally, the bounds hold even with an inaccurate disturbance model, which allows for data-driven disturbance quantification methods to be used, such as conformal prediction. Simulation results demonstrate that the proposed approach outperforms state-of-the-art methods while satisfying average violation constraints.
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
We study the error introduced by entropy regularization in infinite-horizon, discrete, discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O(\tau)$, where $\tau$ is the regularization strength. We provide a lower bound matching our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this gradient flow, which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropy-regularized natural policy gradient methods, the overall error decays exponentially in the square root of the number of iterations, improving over existing sublinear guarantees. Finally, we extend our analysis to settings beyond the entropy. In particular, we characterize the implicit bias regarding general convex potentials and their resulting generalized natural policy gradients.
comment: 32 pages, 1 figure
Introducing Modelling, Analysis and Control of Three-Phase Electrical Systems Using Geometric Algebra
State-of-the-art techniques for modeling, analysis and control of three-phase electrical systems belong to the real-valued multi-input/multi-output (MIMO) domain, or to the complex-valued nonlinear single-input/single-output (SISO) domain. In order to complement both domains while simplifying complexity and offering new analysis and design perspectives, this paper introduces the application of geometric algebra (GA) principles to the modeling, analysis and control of three-phase electrical systems. The key contribution for the modeling part is the identification of the transformation that allows transferring real-valued linear MIMO systems into GA-valued linear SISO representations (with independence of having a balanced or unbalanced system). Closed-loop stability analysis in the new space is addressed by using intrinsic properties of GA. In addition, a recipe for designing stabilizing and decoupling GA-valued controllers is provided. Numerical examples illustrate key developments and experiments corroborate the main findings.
comment: 10 pages, 5 figures, final version of an accepted journal paper (without the journal format)
A Unified Approach to Enforce Non-Negativity Constraint in Neural Network Approximation for Optimal Voltage Regulation (preprint)
Power system voltage regulation is crucial to maintain power quality while integrating intermittent renewable resources in distribution grids. However, the system model on the grid edge is often unknown, making it difficult to model physical equations for optimal control. Therefore, previous work proposes structured data-driven methods like input convex neural networks (ICNN) for "optimal" control without relying on a physical model. While ICNNs offer theoretical guarantees based on restrictive assumptions of non-negative neural network parameters, can one improve the approximation power with an extra step on negative duplication of inputs? We show that such added mirroring step fails to improve accuracy, as a linear combination of the original input and duplicated input is equivalent to a linear operation of ICNN's input without duplication. While this design can not improve performance, we propose a unified approach to embed the non-negativity constraint as a regularized optimization of the neural network, contrary to the existing methods, which added a loosely integrated second step for post-processing on parameter negation. Our integration directly ties back-propagation to simultaneously minimizing the approximation error while enforcing the convexity constraints. Numerical experiments validate the issues of the mirroring method and show that our integrated objective can avoid problems such as unstable training and non-convergence existing in other methods for optimal control.
comment: Submitted to the 58th Hawaii International Conference on System Sciences (HICSS-58)
Neural Operators for Predictor Feedback Control of Nonlinear Delay Systems
Predictor feedback designs are critical for delay-compensating controllers in nonlinear systems. However, these designs are limited in practical applications as predictors cannot be directly implemented, but require numerical approximation schemes, which become computationally prohibitive when system dynamics are expensive to compute. To address this challenge, we recast the predictor design as an operator learning problem, and learn the predictor mapping via a neural operator. We prove the existence of an arbitrarily accurate neural operator approximation of the predictor operator. Under the approximated predictor, we achieve semiglobal practical stability of the closed-loop nonlinear delay system. The estimate is semiglobal in a unique sense - one can enlarge the set of initial states as desired, though this increases the difficulty of training a neural operator, which appears practically in the stability estimate. Furthermore, our analysis holds for any black-box predictor satisfying the universal approximation error bound. We demonstrate the approach by controlling a 5-link robotic manipulator with different neural operator models, achieving significant speedups compared to classic predictor feedback schemes while maintaining closed-loop stability.
comment: 26 pages. Learning for Dynamics and Control 2025
Proximal Gradient Dynamics and Feedback Control for Equality-Constrained Composite Optimization
This paper studies equality-constrained composite minimization problems. This class of problems, capturing regularization terms and convex inequality constraints, naturally arises in a wide range of engineering and machine learning applications. To tackle these minimization problems, we introduce the \emph{proportional--integral proximal gradient dynamics} (PI--PGD): a closed-loop system where the Lagrange multipliers are control inputs and states are the problem decision variables. First, we establish the equivalence between the minima of the optimization problem and the equilibria of the PI--PGD. Then {for the case of affine constraints}, {by} leveraging tools from contraction theory we give a comprehensive convergence analysis for the dynamics, showing linear--exponential convergence towards the equilibrium. That is, the distance between each solution and the equilibrium is upper bounded by a function that first decreases linearly and then exponentially. Our findings are illustrated numerically on a set of representative examples, which include an application to entropy-regularized optimal transport.
comment: 14 pages, 7 figures
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front ICLR 2025
Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery of the Pareto front. Furthermore, integrating preferences into the input of policy or value functions presents scalability challenges, in particular as the dimension of the state and preference space grow, which can complicate the learning process and hinder the algorithm's performance on more complex tasks. To address these issues, we propose a two-stage Pareto front discovery algorithm called Constrained MORL (C-MORL), which serves as a seamless bridge between constrained policy optimization and MORL. Concretely, a set of policies is trained in parallel in the initialization stage, with each optimized towards its individual preference over the multiple objectives. Then, to fill the remaining vacancies in the Pareto front, the constrained optimization steps are employed to maximize one objective while constraining the other objectives to exceed a predefined threshold. Empirically, compared to recent advancements in MORL methods, our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks, especially with numerous objectives (up to nine objectives in our experiments).
comment: Published as a conference paper at ICLR 2025. Code available at https://github.com/RuohLiuq/C-MORL
Systems and Control (EESS)
Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems
This paper presents a novel approach for optimizing the scheduling and control of Pan-Tilt-Zoom (PTZ) cameras in dynamic surveillance environments. The proposed method integrates Kalman filters for motion prediction with a dynamic network flow model to enhance real-time video capture efficiency. By assigning Kalman filters to tracked objects, the system predicts future locations, enabling precise scheduling of camera tasks. This prediction-driven approach is formulated as a network flow optimization, ensuring scalability and adaptability to various surveillance scenarios. To further reduce redundant monitoring, we also incorporate group-tracking nodes, allowing multiple objects to be captured within a single camera focus when appropriate. In addition, a value-based system is introduced to prioritize camera actions, focusing on the timely capture of critical events. By adjusting the decay rates of these values over time, the system ensures prompt responses to tasks with imminent deadlines. Extensive simulations demonstrate that this approach improves coverage, reduces average wait times, and minimizes missed events compared to traditional master-slave camera systems. Overall, our method significantly enhances the efficiency, scalability, and effectiveness of surveillance systems, particularly in dynamic and crowded environments.
comment: 7 pages, 3 Figures, Accepted at AIRC 2025
Consensus Seminorms and their Applications
Consensus is a well-studied problem in distributed sensing, computation and control, yet deriving useful and easily computable bounds on the rate of convergence to consensus remains a challenge. We study the applications of seminorms for this goal. We revisit a previously suggested family of seminorms and correct an error made in their original presentation where it was claimed that the a certain seminorm is equal to the well-known coefficient of ergodicity. We then propose a wider family of seminorms which guarantee convergence at an exponential rate of infinite products of matrices which generalizes known results on stochastic matrices to the class of matrices whose row sums are all equal one. Finally, we show that such seminorms cannot be used to bound the rate of convergence of classes larger than the well-known class of scrambling matrices, and pose several open questions for future research.
Integrated equilibrium model for electrified logistics and power systems
This paper proposes an integrated equilibrium model to characterize the complex interactions between electrified logistics systems and electric power delivery systems. The model consists of two major players: an electrified logistics operator (ELO) and a power system operator (PSO). The ELO aims to maximize its profit by strategically scheduling and routing its electric delivery vehicles (e-trucks) for deliveries and charging, in response to the locational marginal price (LMP) set by the PSO. The routing, delivery, and charging behaviors of e-trucks are modeled by a perturbed utility Markov decision process (PU-MDP) while their collective operations are optimized to achieve the ELO's objective by designing rewards in the PU-MDP. On the other hand, PSO optimizes the energy price by considering both the spatiotemporal e-truck charging demand and the base electricity load. The equilibrium of the integrated system is formulated as a fixed point, proved to exist under mild assumptions, and solved for a case study on the Hawaii network via Anderson's fixed-point acceleration algorithm. Along with these numerical results, this paper provides both theoretical insights and practical guidelines to achieve sustainable and efficient operations in modern electrified logistics and power systems.
Opinion Dynamics on Signed Graphs and Graphons
In this paper, we make use of graphon theory to study opinion dynamics on large undirected networks. The opinion dynamics models that we take into consideration allow for negative interactions between the individuals, whose opinions can thus grow apart. We consider both the repelling and the opposing models of negative interactions, which have been studied in the literature. We define the repelling and the opposing dynamics on signed graphons and we show that their initial value problem solutions exist and are unique. We then show that, in a suitable sense, the graphon dynamics is a good approximation of the dynamics on large graphs that converge to a graphon. This result applies to large random graphs that are sampled according to a graphon (W-random graphs), for which we provide a new convergence result under very general assumptions.
Consensus-Aware AV Behavior: Trade-offs Between Safety, Interaction, and Performance in Mixed Urban Traffic
Transportation systems have long been shaped by complexity and heterogeneity, driven by the interdependency of agent actions and traffic outcomes. The deployment of automated vehicles (AVs) in such systems introduces a new challenge: achieving consensus across safety, interaction quality, and traffic performance. In this work, we position consensus as a fundamental property of the traffic system and aim to quantify it. We use high-resolution trajectory data from the Third Generation Simulation (TGSIM) dataset to empirically analyze AV and human-driven vehicle (HDV) behavior at a signalized urban intersection and around vulnerable road users (VRUs). Key metrics, including Time-to-Collision (TTC), Post-Encroachment Time (PET), deceleration patterns, headways, and string stability, are evaluated across the three performance dimensions. Results show that full consensus across safety, interaction, and performance is rare, with only 1.63% of AV-VRU interaction frames meeting all three conditions. These findings highlight the need for AV models that explicitly balance multi-dimensional performance in mixed-traffic environments. Full reproducibility is supported via our open-source codebase on https://github.com/wissamkontar/Consensus-AV-Analysis.
comment: 7 pages, 8 figures
NN-Based Joint Mitigation of IQ Imbalance and PA Nonlinearity With Multiple States
Joint mitigation of IQ imbalance and PA nonlinearity is important for improving the performance of radio frequency (RF) transmitters. In this paper, we propose a new neural network (NN) model, which can be used for joint digital pre-distortion (DPD) of non-ideal IQ modulators and PAs in a transmitter with multiple operating states. The model is based on the methodology of multi-task learning (MTL). In this model, the hidden layers of the main NN are shared by all signal states, and the output layer's weights and biases are dynamically generated by another NN. The experimental results show that the proposed model can effectively perform joint DPD for IQ-PA systems, and it achieves better overall performance within multiple signal states than the existing methods.
Self-Calibrating Position Measurements: Applied to Imperfect Hall Sensors
Linear Hall sensors are a cost-effective alternative to optical encoders for measuring the rotor positions of actuators, with the main challenge being that they exhibit position-dependent inaccuracies resulting from manufacturing tolerances. This paper develops a data-driven calibration procedure for linear analog Hall sensors that enables accurate online estimates of the rotor angle without requiring expensive external encoders. The approach combines closed-loop data collection with nonlinear identification to obtain an accurate model of the sensor inaccuracies, which is subsequently used for online compensation. Simulation results show that when the flux density model structure is known, measurement errors are reduced to the sensor noise floor, and experiments on an industrial setup demonstrate a factor of 2.6 reduction in the root-mean-square measurement error. These results confirm that Hall sensor inaccuracies can be calibrated even when no external encoder is available, improving their practical applicability.
comment: 6 pages, 8 figures, final version, accepted for the joint 10th IFAC Symposium on Mechatronic Systems & 14th IFAC Symposium on Robotics
Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections
Unsignalized intersections pose significant safety and efficiency challenges due to complex traffic flows. This paper proposes a novel roadside unit (RSU)-centric cooperative driving system leveraging global perception and vehicle-to-infrastructure (V2I) communication. The core of the system is an RSU-based decision-making module using a two-stage hybrid reinforcement learning (RL) framework. At first, policies are pre-trained offline using conservative Q-learning (CQL) combined with behavior cloning (BC) on collected dataset. Subsequently, these policies are fine-tuned in the simulation using multi-agent proximal policy optimization (MAPPO), aligned with a self-attention mechanism to effectively solve inter-agent dependencies. RSUs perform real-time inference based on the trained models to realize vehicle control via V2I communications. Extensive experiments in CARLA environment demonstrate high effectiveness of the proposed system, by: \textit{(i)} achieving failure rates below 0.03\% in coordinating three connected and autonomous vehicles (CAVs) through complex intersection scenarios, significantly outperforming the traditional Autoware control method, and \textit{(ii)} exhibiting strong robustness across varying numbers of controlled agents and shows promising generalization capabilities on other maps.
comment: 7 pages
Beyond Task Performance: Human Experience in Human-Robot Collaboration
Human interaction experience plays a crucial role in the effectiveness of human-machine collaboration, especially as interactions in future systems progress towards tighter physical and functional integration. While automation design has been shown to impact task performance, its influence on human experience metrics such as flow, sense of agency (SoA), and embodiment remains underexplored. This study investigates how variations in automation design affect these psychological experience measures and examines correlations between subjective experience and physiological indicators. A user study was conducted in a simulated wood workshop, where participants collaborated with a lightweight robot under four automation levels. The results of the study indicate that medium automation levels enhance flow, SoA and embodiment, striking a balance between support and user autonomy. In contrast, higher automation, despite optimizing task performance, diminishes perceived flow and agency. Furthermore, we observed that grip force might be considered as a real-time proxy of SoA, while correlations with heart rate variability were inconclusive. The findings underscore the necessity for automation strategies that integrate human- centric metrics, aiming to optimize both performance and user experience in collaborative robotic systems
Impact of Grid-Forming Inverters on Protective Relays: A Perspective for Current Limiting Control Design
Grid-forming (GFM) inverters can significantly alter the fault characteristics of power systems, which challenges the proper function of protective relays. This paper gives a holistic analysis of the interaction between GFM inverter-based resources (IBRs) and the supervising elements in protective relays, including directional and phase selection elements. It is revealed that the current limiting control (CLC) that is based on the current reference saturation method, adversely affects the performance of supervising elements that rely on the negative-sequence quantities. In contrast, adopting highly inductive virtual impedance in the CLC enables a reliable operation of such elements. This finding provides insights into the design of CLC for GFM IBRs from a protection perspective. It is further found that even with a highly inductive virtual impedance, the altered virtual impedance dynamics introduced by the CLC can still lead to malfunctions of the incremental quantity-based supervising elements. These theoretical findings are corroborated by simulations and controller hardware-in-the-loop (CHIL) tests.
Energy Efficient RSMA-Based LEO Satellite Communications Assisted by UAV-Mounted BD-Active RIS: A DRL Approach
This paper proposes an advanced non-terrestrial communication architecture that integrates Rate-Splitting Multiple Access (RSMA) with a Beyond-Diagonal Active Reconfigurable Intelligent Surface (BD-ARIS) mounted on a UAV under the coverage of a Low Earth Orbit (LEO) satellite. The BD-ARIS adopts a group-connected structure to enhance signal amplification and adaptability, while RSMA enables efficient multi-user access by dividing messages into common and private components. The system jointly optimizes satellite beamforming, UAV positioning, power allocation, and rate-splitting ratios to maximize the overall energy efficiency (EE). To solve the resulting non-convex and high-dimensional problem, we employ three state-of-the-art deep reinforcement learning (DRL) algorithms: Trust Region Policy Optimization (TRPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Asynchronous Advantage Actor-Critic (A3C). Moreover, realistic models for the power consumption of both the UAV and the BD-ARIS are considered. Simulation results reveal that TRPO consistently achieves the best performance in terms of EE and sum rate, especially under high transmit powers and challenging deployment scenarios. TD3 converges faster and performs competitively in moderate settings, while A3C suffers from instability due to its high variance. Additionally, the robustness of each algorithm under channel state information (CSI) uncertainty is evaluated, confirming TRPO resilience to imperfect observations. Overall, the proposed RSMA-BD-ARIS framework significantly outperforms conventional RIS-assisted designs and provides a scalable, energy-efficient solution for 6G and massive IoT applications in non-terrestrial networks.
Scalable 49-Channel Neural Recorder with an Event-Driven Ramp ADC and PCA Compression in 28 nm CMOS
Neural interfaces advance neuroscience research and therapeutic innovations by accurately measuring neuronal activity. However, recording raw data from numerous neurons results in substantial amount of data and poses challenges for wireless transmission. While conventional neural recorders consume energy to digitize and process the full neural signal, only a fraction of this data carries essential spiking information. Leveraging on this signal sparsity, this paper introduces a neural recording integrated circuit in TSMC 28nm CMOS. It features an event-driven ramp analog-to-digital converter, and a spike compression module based on principal component analysis. The circuit consists of 49 channels, each occupying an on-chip area of 50 $\times$ 60 $\mu$m$^2$. The circuit measures 1370 $\times$ 1370 $\mu$m$^2$ and consumes 534 $\mu$W. Compression testing on a synthetic dataset demonstrated an 8.8-fold reduction compared to raw spikes and a 328-fold reduction relative to the raw signal. This compression approach maintained a spike sorting accuracy of 74.9%, compared to the 79.5% accuracy obtained with the raw signal. The paper details the architecture and performance outcomes of the neural recording circuit and its compression module.
UX-aware Rate Allocation for Real-Time Media
Immersive communications is a key use case for 6G where applications require reliable latency-bound media traffic at a certain data rate to deliver an acceptable User Experience (UX) or Quality-of-Experience (QoE). The Quality-of-Service (QoS) framework of current cellular systems (4G and 5G) and prevalent network congestion control algorithms for latency-bound traffic like L4S typically target network-related Key Performance Indicators (KPIs) such as data rates and latencies. Network capacity is based on the number of users that attain these KPIs. However, the UX of an immersive application for a given data rate and latency is not the same across users, since it depends on other factors such as the complexity of the media being transmitted and the encoder format. This implies that guarantees on network KPIs do not necessarily translate to guarantees on the UX. In this paper, we propose a framework in which the communication network can provide guarantees on the UX. The framework requires application servers to share real-time information on UX dependency on data rate to the network, which in turn, uses this information to maximize a UX-based network utility function. Our framework is motivated by the recent industry trends of increasing application awareness at the network, and pushing application servers towards the edge, allowing for tighter coordination between the servers and the 6G system. Our simulation results show that the proposed framework substantially improves the UX capacity of the network, which is the number of users above a certain UX threshold, compared to conventional rate control algorithms.
Joint User Association and Bandwidth Assignment for Digital Twin-Assisted Multi-RAT Networks
In this paper, we investigate user equipment (UE)-radio access technology (RAT) association and bandwidth assignment to maximize sum-rates in a multi-RAT network. To this end, we formulate an optimization problem that jointly addresses UE association and bandwidth allocation, adhering to practical constraints. Because of the NP-hard nature of this problem, finding a globally optimal solution is computationally infeasible. To address this challenge, we propose a centralized and computationally efficient heuristic algorithm that aims to maximize sum-rates while enhancing quality of service (QoS). Yet, the proposed approach requires global channel state information (CSI) for near-optimal performance, which incurs substantial overhead and data collection costs in large-scale multi-RAT networks. To alleviate this burden, we use a digital twin (DT) of the multi-RAT network, leveraging its context-awareness to acquire global CSI with reduced overhead. Our numerical results reveal that our approach improves sum-rates by up to 43% over baseline method, with less than a 5% deviation from the theoretical optimal solution, while achieving up to a 43% improvement in QoS. Further analysis reveals that our method not only surpasses the optimal solution in terms of QoS enhancement, but also ensures significant computational efficiency.
comment: 6 pages, 6 figures, Accepted in IEEE ICC 2025
Geometric Fault-Tolerant Neural Network Tracking Control of Unknown Systems on Matrix Lie Groups
We present a geometric neural network-based tracking controller for systems evolving on matrix Lie groups under unknown dynamics, actuator faults, and bounded disturbances. Leveraging the left-invariance of the tangent bundle of matrix Lie groups, viewed as an embedded submanifold of the vector space $\R^{N\times N}$, we propose a set of learning rules for neural network weights that are intrinsically compatible with the Lie group structure and do not require explicit parameterization. Exploiting the geometric properties of Lie groups, this approach circumvents parameterization singularities and enables a global search for optimal weights. The ultimate boundedness of all error signals -- including the neural network weights, the coordinate-free configuration error function, and the tracking velocity error -- is established using Lyapunov's direct method. To validate the effectiveness of the proposed method, we provide illustrative simulation results for decentralized formation control of multi-agent systems on the Special Euclidean group.
VIMPPI: Enhancing Model Predictive Path Integral Control with Variational Integration for Underactuated Systems
This paper presents VIMPPI, a novel control approach for underactuated double pendulum systems developed for the AI Olympics competition. We enhance the Model Predictive Path Integral framework by incorporating variational integration techniques, enabling longer planning horizons without additional computational cost. Operating at 500-700 Hz with control interpolation and disturbance detection mechanisms, VIMPPI substantially outperforms both baseline methods and alternative MPPI implementations
Toward a Harmonized Approach - Requirement-based Structuring of a Safety Assurance Argumentation for Automated Vehicles
Despite increasing testing operation on public roads, media reports on incidents show that safety issues remain to this day. One major cause factoring into this circumstance is high development uncertainty that manufacturers face when deploying these systems in an open context. In particular, one challenge is establishing a valid argument at design time that the vehicle will exhibit reasonable residual risk when operating in its intended operational design domain. Regulations, such as the European Implementing Regulation 2022/1426, require manufacturers to provide a safety assurance argumentation for SAE-Level-4 automated vehicles. While there is extensive literature on assurance cases for safety-critical systems, the domain of automated driving lacks explicit requirements regarding the creation of safety assurance argumentations. In this paper, we aim to narrow this gap by elaborating a requirement-based approach. We derive structural requirements for an argumentation from literature and supplement these with requirements derived from stakeholder concerns. We implement the requirements, yielding a proposal for an overall argumentation structure. The resulting "safety arguments" argue over four topic complexes: The developed product, the underlying process including its conformance/compliance to standards/laws, as well as the argumentations' context and soundness. Finally, we instantiate this structure with respect to domain-specific needs and principles.
comment: submitted for publication
Generating Building-Level Heat Demand Time Series by Combining Occupancy Simulations and Thermal Modeling
Despite various efforts, decarbonizing the heating sector remains a significant challenge. To tackle it by smart planning, the availability of highly resolved heating demand data is key. Several existing models provide heating demand only for specific applications. Typically, they either offer time series for a larger area or annual demand data on a building level, but not both simultaneously. Additionally, the diversity in heating demand across different buildings is often not considered. To address these limitations, this paper presents a novel method for generating temporally resolved heat demand time series at the building level using publicly available data. The approach integrates a thermal building model with stochastic occupancy simulations that account for variability in user behavior. As a result, the tool serves as a cost-effective resource for cross-sectoral energy system planning and policy development, particularly with a focus on the heating sector. The obtained data can be used to assess the impact of renovation and retrofitting strategies, or to analyze district heating expansion. To illustrate the potential applications of this approach, we conducted a case study in Puertollano (Spain), where we prepared a dataset of heating demand with hourly resolution for each of 9,298 residential buildings. This data was then used to compare two different pathways for the thermal renovation of these buildings. By relying on publicly available data, this method can be adapted and applied to various European regions, offering broad usability in energy system optimization and analysis of decarbonization strategies.
Randomized Transport Plans via Hierarchical Fully Probabilistic Design
An optimal randomized strategy for design of balanced, normalized mass transport plans is developed. It replaces -- but specializes to -- the deterministic, regularized optimal transport (OT) strategy, which yields only a certainty-equivalent plan. The incompletely specified -- and therefore uncertain -- transport plan is acknowledged to be a random process. Therefore, hierarchical fully probabilistic design (HFPD) is adopted, yielding an optimal hyperprior supported on the set of possible transport plans, and consistent with prior mean constraints on the marginals of the uncertain plan. This Bayesian resetting of the design problem for transport plans -- which we call HFPD-OT -- confers new opportunities. These include (i) a strategy for the generation of a random sample of joint transport plans; (ii) randomized marginal contracts for individual source-target pairs; and (iii) consistent measures of uncertainty in the plan and its contracts. An application in fair market matching is outlined, in which HFPD-OT enables the recruitment of a more diverse subset of contracts -- than is possible in classical OT -- into the delivery of an expected plan.
comment: 25 pages, 26 figures
Graph-based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. Our approach builds upon the conventional Hybrid A star algorithm by introducing a time-indexed variant that explicitly accounts for the predictions of dynamic obstacles during node exploration in the graph, thus enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A star algorithm within an online planning framework to compute local paths at each planning step, guided by an adaptively chosen intermediate goal. The proposed method is validated in diverse parking scenarios, including perpendicular, angled, and parallel parking. Through simulations, we showcase our approach's potential in greatly improving the efficiency and safety when compared to the state of the art spline-based planning method for parking situations.
comment: IEEE Intelligent Vehicles Symposium 2025
Measurement-Based Line-Impedance Estimation in the Absence of Phasor Measurement Units
This paper proposes and compares experimentally several methods to estimate the series resistance and reactance (i.e., the transversal components of the $\pi$-model of a line) of low-voltage lines in distribution grids. It first shows that if phasor measurements are available and the grid nodal voltages and power injections are known, the problem can be formulated and solved as a conventional load flow with properly adjusted unknowns. To solve this problem, we propose an analytical derivation of the Jacobian matrix. If only RMS values are available, such as from smart meters, integrating information from multiple intervals becomes necessary, ultimately opening to least-squares estimations, widely adopted in the literature. In this context, applying the proposed Jacobian contributes to accelerating the problem resolution of existing algorithms. The methods are compared in terms of estimation performance and convergence by using measurements from an experimental distribution grid interfacing real-world components and with realistic size implemented at the Gridlab at HES-SO Valais.
comment: 2025 IEEE Kiel PowerTech
Spectrum Sharing in Satellite-Terrestrial Integrated Networks: Frameworks, Approaches, and Opportunities
To accommodate the increasing communication needs in non-terrestrial networks (NTNs), wireless users in remote areas may require access to more spectrum than is currently allocated. Terrestrial networks (TNs), such as cellular networks, are deployed in specific areas, but many underused licensed spectrum bands remain in remote areas. Therefore, bringing NTNs to a shared spectrum with TNs can improve network capacity under reasonable interference management. However, in satellite-terrestrial integrated networks (STINs), the comprehensive coverage of a satellite and the unbalanced communication resources of STINs make it challenging to effectively manage mutual interference between NTN and TN. This article presents the fundamentals and prospects of spectrum sharing (SS) in STINs by introducing four SS frameworks, their potential application scenarios, and technical challenges. Furthermore, advanced SS approaches related to interference management in STINs and performance metrics of SS in STINs are introduced. Moreover, a preliminary performance evaluation showcases the potential for sharing the spectrum between NTN and TN. Finally, future research opportunities for SS in STINs are discussed.
Set-based and Dynamical Feedback-augmented Hands-off Control
A novel set-theoretical approach to hands-off control is proposed, which focuses on spatial arguments for command limitation, rather than temporal ones. By employing dynamical feedback alongside invariant set-based constraints, actuation is employed only to drive the system's state inside a "hands-off region" of its state-space, where the plant may freely evolve in open-loop configuration. A computationally-efficient procedure with strong theoretical guarantees is devised, and its effectiveness is showcased via an intuitive practical example.
comment: 10 pages, 4 figures
Design of a compact low loss 2-way millimetre wave power divider for future communication
In this paper, a rectangular-shaped power divider has been presented operating at 27.9 GHz. The power divider has achieved acceptable results for important parameters such as S11, S12, S21, and S22. The substrate employed for the power divider is Roger 3003 which has a thickness of 1.6 mm. This power divider provides a reflection coefficient of -12.2 dB and an insertion loss of 3.1 dB at 28 GHz. This ka-band T-junction power divider covers 68% of the bandwidth. Dimensions of the ka-band T-junction power divider are 50x80 mm. Due to its dimensions and bandwidth this power divider is more suitable for millimetre wave applications like RADAR, beamforming, and 5G applications.
comment: 7 pages, 6 figures, 2 tables
Toward a Harmonized Approach -- Requirement-based Structuring of a Safety Assurance Argumentation for Automated Vehicles
Despite increasing testing operation on public roads, media reports on incidents show that safety issues remain to this day. One major cause factoring into this circumstance is high development uncertainty that manufacturers face when deploying these systems in an open context. In particular, one challenge is establishing a valid argument at design time that the vehicle will exhibit reasonable residual risk when operating in its intended operational design domain. Regulations, such as the European Implementing Regulation 2022/1426, require manufacturers to provide a safety assurance argumentation for SAE-Level-4 automated vehicles. While there is extensive literature on assurance cases for safety-critical systems, the domain of automated driving lacks explicit requirements regarding the creation of safety assurance argumentations. In this paper, we aim to narrow this gap by elaborating a requirement-based approach. We derive structural requirements for an argumentation from literature and supplement these with requirements derived from stakeholder concerns. We implement the requirements, yielding a proposal for an overall argumentation structure. The resulting "safety arguments" argue over four topic complexes: The developed product, the underlying process including its conformance/compliance to standards/laws, as well as the argumentations' context and soundness. Finally, we instantiate this structure with respect to domain-specific needs and principles.
comment: submitted for publication
Disturbance-adaptive Model Predictive Control for Bounded Average Constraint Violations
This paper considers stochastic linear time-invariant systems subject to constraints on the average number of state-constraint violations over time without knowing the disturbance distribution. We present a novel disturbance-adaptive model predictive control (DAD-MPC) framework, which adjusts the disturbance model based on measured constraint violations. Using a robust invariance method, DAD-MPC ensures recursive feasibility and guarantees asymptotic or robust bounds on average constraint violations. Additionally, the bounds hold even with an inaccurate disturbance model, which allows for data-driven disturbance quantification methods to be used, such as conformal prediction. Simulation results demonstrate that the proposed approach outperforms state-of-the-art methods while satisfying average violation constraints.
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
We study the error introduced by entropy regularization in infinite-horizon, discrete, discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O(\tau)$, where $\tau$ is the regularization strength. We provide a lower bound matching our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this gradient flow, which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropy-regularized natural policy gradient methods, the overall error decays exponentially in the square root of the number of iterations, improving over existing sublinear guarantees. Finally, we extend our analysis to settings beyond the entropy. In particular, we characterize the implicit bias regarding general convex potentials and their resulting generalized natural policy gradients.
comment: 32 pages, 1 figure
Introducing Modelling, Analysis and Control of Three-Phase Electrical Systems Using Geometric Algebra
State-of-the-art techniques for modeling, analysis and control of three-phase electrical systems belong to the real-valued multi-input/multi-output (MIMO) domain, or to the complex-valued nonlinear single-input/single-output (SISO) domain. In order to complement both domains while simplifying complexity and offering new analysis and design perspectives, this paper introduces the application of geometric algebra (GA) principles to the modeling, analysis and control of three-phase electrical systems. The key contribution for the modeling part is the identification of the transformation that allows transferring real-valued linear MIMO systems into GA-valued linear SISO representations (with independence of having a balanced or unbalanced system). Closed-loop stability analysis in the new space is addressed by using intrinsic properties of GA. In addition, a recipe for designing stabilizing and decoupling GA-valued controllers is provided. Numerical examples illustrate key developments and experiments corroborate the main findings.
comment: 10 pages, 5 figures, final version of an accepted journal paper (without the journal format)
A Unified Approach to Enforce Non-Negativity Constraint in Neural Network Approximation for Optimal Voltage Regulation (preprint)
Power system voltage regulation is crucial to maintain power quality while integrating intermittent renewable resources in distribution grids. However, the system model on the grid edge is often unknown, making it difficult to model physical equations for optimal control. Therefore, previous work proposes structured data-driven methods like input convex neural networks (ICNN) for "optimal" control without relying on a physical model. While ICNNs offer theoretical guarantees based on restrictive assumptions of non-negative neural network parameters, can one improve the approximation power with an extra step on negative duplication of inputs? We show that such added mirroring step fails to improve accuracy, as a linear combination of the original input and duplicated input is equivalent to a linear operation of ICNN's input without duplication. While this design can not improve performance, we propose a unified approach to embed the non-negativity constraint as a regularized optimization of the neural network, contrary to the existing methods, which added a loosely integrated second step for post-processing on parameter negation. Our integration directly ties back-propagation to simultaneously minimizing the approximation error while enforcing the convexity constraints. Numerical experiments validate the issues of the mirroring method and show that our integrated objective can avoid problems such as unstable training and non-convergence existing in other methods for optimal control.
comment: Submitted to the 58th Hawaii International Conference on System Sciences (HICSS-58)
Neural Operators for Predictor Feedback Control of Nonlinear Delay Systems
Predictor feedback designs are critical for delay-compensating controllers in nonlinear systems. However, these designs are limited in practical applications as predictors cannot be directly implemented, but require numerical approximation schemes, which become computationally prohibitive when system dynamics are expensive to compute. To address this challenge, we recast the predictor design as an operator learning problem, and learn the predictor mapping via a neural operator. We prove the existence of an arbitrarily accurate neural operator approximation of the predictor operator. Under the approximated predictor, we achieve semiglobal practical stability of the closed-loop nonlinear delay system. The estimate is semiglobal in a unique sense - one can enlarge the set of initial states as desired, though this increases the difficulty of training a neural operator, which appears practically in the stability estimate. Furthermore, our analysis holds for any black-box predictor satisfying the universal approximation error bound. We demonstrate the approach by controlling a 5-link robotic manipulator with different neural operator models, achieving significant speedups compared to classic predictor feedback schemes while maintaining closed-loop stability.
comment: 26 pages. Learning for Dynamics and Control 2025
Proximal Gradient Dynamics and Feedback Control for Equality-Constrained Composite Optimization
This paper studies equality-constrained composite minimization problems. This class of problems, capturing regularization terms and convex inequality constraints, naturally arises in a wide range of engineering and machine learning applications. To tackle these minimization problems, we introduce the \emph{proportional--integral proximal gradient dynamics} (PI--PGD): a closed-loop system where the Lagrange multipliers are control inputs and states are the problem decision variables. First, we establish the equivalence between the minima of the optimization problem and the equilibria of the PI--PGD. Then {for the case of affine constraints}, {by} leveraging tools from contraction theory we give a comprehensive convergence analysis for the dynamics, showing linear--exponential convergence towards the equilibrium. That is, the distance between each solution and the equilibrium is upper bounded by a function that first decreases linearly and then exponentially. Our findings are illustrated numerically on a set of representative examples, which include an application to entropy-regularized optimal transport.
comment: 14 pages, 7 figures
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front ICLR 2025
Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery of the Pareto front. Furthermore, integrating preferences into the input of policy or value functions presents scalability challenges, in particular as the dimension of the state and preference space grow, which can complicate the learning process and hinder the algorithm's performance on more complex tasks. To address these issues, we propose a two-stage Pareto front discovery algorithm called Constrained MORL (C-MORL), which serves as a seamless bridge between constrained policy optimization and MORL. Concretely, a set of policies is trained in parallel in the initialization stage, with each optimized towards its individual preference over the multiple objectives. Then, to fill the remaining vacancies in the Pareto front, the constrained optimization steps are employed to maximize one objective while constraining the other objectives to exceed a predefined threshold. Empirically, compared to recent advancements in MORL methods, our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks, especially with numerous objectives (up to nine objectives in our experiments).
comment: Published as a conference paper at ICLR 2025. Code available at https://github.com/RuohLiuq/C-MORL
Multiagent Systems
Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions IJCAI 2025
In collaborative tasks, autonomous agents fall short of humans in their capability to quickly adapt to new and unfamiliar teammates. We posit that a limiting factor for zero-shot coordination is the lack of shared task abstractions, a mechanism humans rely on to implicitly align with teammates. To address this gap, we introduce HA$^2$: Hierarchical Ad Hoc Agents, a framework leveraging hierarchical reinforcement learning to mimic the structured approach humans use in collaboration. We evaluate HA$^2$ in the Overcooked environment, demonstrating statistically significant improvement over existing baselines when paired with both unseen agents and humans, providing better resilience to environmental shifts, and outperforming all state-of-the-art methods.
comment: 9 pages (7 paper + 2 references). To be published in IJCAI 2025
Runtime Advocates: A Persona-Driven Framework for Requirements@Runtime Decision Support
Complex systems, such as small Uncrewed Aerial Systems (sUAS) swarms dispatched for emergency response, often require dynamic reconfiguration at runtime under the supervision of human operators. This introduces human-on-the-loop requirements, where evolving needs shape ongoing system functionality and behaviors. While traditional personas support upfront, static requirements elicitation, we propose a persona-based advocate framework for runtime requirements engineering to provide ethically informed, safety-driven, and regulatory-aware decision support. Our approach extends standard personas into event-driven personas. When triggered by events such as adverse environmental conditions, evolving mission state, or operational constraints, the framework updates the sUAS operator's view of the personas, ensuring relevance to current conditions. We create three key advocate personas, namely Safety Controller, Ethical Governor, and Regulatory Auditor, to manage trade-offs among risk, ethical considerations, and regulatory compliance. We perform a proof-of-concept validation in an emergency response scenario using sUAS, showing how our advocate personas provide context-aware guidance grounded in safety, regulatory, and ethical constraints. By evolving static, design-time personas into adaptive, event-driven advocates, the framework surfaces mission-critical runtime requirements in response to changing conditions. These requirements shape operator decisions in real time, aligning actions with the operational demands of the moment.
comment: 7 pages, 5 figures. Submitted to 33rd IEEE International Requirements Engineering 2025 conference
Consensus-Aware AV Behavior: Trade-offs Between Safety, Interaction, and Performance in Mixed Urban Traffic
Transportation systems have long been shaped by complexity and heterogeneity, driven by the interdependency of agent actions and traffic outcomes. The deployment of automated vehicles (AVs) in such systems introduces a new challenge: achieving consensus across safety, interaction quality, and traffic performance. In this work, we position consensus as a fundamental property of the traffic system and aim to quantify it. We use high-resolution trajectory data from the Third Generation Simulation (TGSIM) dataset to empirically analyze AV and human-driven vehicle (HDV) behavior at a signalized urban intersection and around vulnerable road users (VRUs). Key metrics, including Time-to-Collision (TTC), Post-Encroachment Time (PET), deceleration patterns, headways, and string stability, are evaluated across the three performance dimensions. Results show that full consensus across safety, interaction, and performance is rare, with only 1.63% of AV-VRU interaction frames meeting all three conditions. These findings highlight the need for AV models that explicitly balance multi-dimensional performance in mixed-traffic environments. Full reproducibility is supported via our open-source codebase on https://github.com/wissamkontar/Consensus-AV-Analysis.
comment: 7 pages, 8 figures
Benchmarking LLMs' Swarm intelligence
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict constraints-such as limited local perception and communication, characteristic of natural swarms-remains largely unexplored, particularly concerning the nuances of swarm intelligence. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination that arise when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks within a configurable 2D grid environment, forcing agents to rely primarily on local sensory input (k x k view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Evaluating several leading LLMs in a zero-shot setting, we find significant performance variations across tasks, highlighting the difficulties posed by local information constraints. While some coordination emerges, results indicate limitations in robust planning and strategy formation under uncertainty in these decentralized scenarios. Assessing LLMs under swarm-like conditions is crucial for realizing their potential in future decentralized systems. We release SwarmBench as an open, extensible toolkit-built upon a customizable and scalable physical system with defined mechanical properties. It provides environments, prompts, evaluation scripts, and the comprehensive experimental datasets generated, aiming to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of Embodied MAS. Our code repository is available at https://github.com/x66ccff/swarmbench.
Facilitating Trustworthy Human-Agent Collaboration in LLM-based Multi-Agent System oriented Software Engineering
Multi-agent autonomous systems (MAS) are better at addressing challenges that spans across multiple domains than singular autonomous agents. This holds true within the field of software engineering (SE) as well. The state-of-the-art research on MAS within SE focuses on integrating LLMs at the core of autonomous agents to create LLM-based multi-agent autonomous (LMA) systems. However, the introduction of LMA systems into SE brings a plethora of challenges. One of the major challenges is the strategic allocation of tasks between humans and the LMA system in a trustworthy manner. To address this challenge, a RACI-based framework is proposed in this work in progress article, along with implementation guidelines and an example implementation of the framework. The proposed framework can facilitate efficient collaboration, ensure accountability, and mitigate potential risks associated with LLM-driven automation while aligning with the Trustworthy AI guidelines. The future steps for this work delineating the planned empirical validation method are also presented.
Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections
Unsignalized intersections pose significant safety and efficiency challenges due to complex traffic flows. This paper proposes a novel roadside unit (RSU)-centric cooperative driving system leveraging global perception and vehicle-to-infrastructure (V2I) communication. The core of the system is an RSU-based decision-making module using a two-stage hybrid reinforcement learning (RL) framework. At first, policies are pre-trained offline using conservative Q-learning (CQL) combined with behavior cloning (BC) on collected dataset. Subsequently, these policies are fine-tuned in the simulation using multi-agent proximal policy optimization (MAPPO), aligned with a self-attention mechanism to effectively solve inter-agent dependencies. RSUs perform real-time inference based on the trained models to realize vehicle control via V2I communications. Extensive experiments in CARLA environment demonstrate high effectiveness of the proposed system, by: \textit{(i)} achieving failure rates below 0.03\% in coordinating three connected and autonomous vehicles (CAVs) through complex intersection scenarios, significantly outperforming the traditional Autoware control method, and \textit{(ii)} exhibiting strong robustness across varying numbers of controlled agents and shows promising generalization capabilities on other maps.
comment: 7 pages
A Large Language Model for Feasible and Diverse Population Synthesis
Generating a synthetic population that is both feasible and diverse is crucial for ensuring the validity of downstream activity schedule simulation in activity-based models (ABMs). While deep generative models (DGMs), such as variational autoencoders and generative adversarial networks, have been applied to this task, they often struggle to balance the inclusion of rare but plausible combinations (i.e., sampling zeros) with the exclusion of implausible ones (i.e., structural zeros). To improve feasibility while maintaining diversity, we propose a fine-tuning method for large language models (LLMs) that explicitly controls the autoregressive generation process through topological orderings derived from a Bayesian Network (BN). Experimental results show that our hybrid LLM-BN approach outperforms both traditional DGMs and proprietary LLMs (e.g., ChatGPT-4o) with few-shot learning. Specifically, our approach achieves approximately 95% feasibility, significantly higher than the ~80% observed in DGMs, while maintaining comparable diversity, making it well-suited for practical applications. Importantly, the method is based on a lightweight open-source LLM, enabling fine-tuning and inference on standard personal computing environments. This makes the approach cost-effective and scalable for large-scale applications, such as synthesizing populations in megacities, without relying on expensive infrastructure. By initiating the ABM pipeline with high-quality synthetic populations, our method improves overall simulation reliability and reduces downstream error propagation. The source code for these methods is available for research and practical application.
comment: 28 pages, 7 figures, 6 tables. Submitted to Transportation Research Part C: Emerging Technologies. Preprint version
Optimization of Infectious Disease Intervention Measures Based on Reinforcement Learning -- Empirical analysis based on UK COVID-19 epidemic data
Globally, the outbreaks of infectious diseases have exerted an extremely profound and severe influence on health security and the economy. During the critical phases of epidemics, devising effective intervention measures poses a significant challenge to both the academic and practical arenas. There is numerous research based on reinforcement learning to optimize intervention measures of infectious diseases. Nevertheless, most of these efforts have been confined within the differential equation based on infectious disease models. Although a limited number of studies have incorporated reinforcement learning methodologies into individual-based infectious disease models, the models employed therein have entailed simplifications and limitations, rendering it incapable of modeling the complexity and dynamics inherent in infectious disease transmission. We establish a decision-making framework based on an individual agent-based transmission model, utilizing reinforcement learning to continuously explore and develop a strategy function. The framework's validity is verified through both experimental and theoretical approaches. Covasim, a detailed and widely used agent-based disease transmission model, was modified to support reinforcement learning research. We conduct an exhaustive exploration of the application efficacy of multiple algorithms across diverse action spaces. Furthermore, we conduct an innovative preliminary theoretical analysis concerning the issue of "time coverage". The results of the experiment robustly validate the effectiveness and feasibility of the methodological framework of this study. The coping strategies gleaned therefrom prove highly efficacious in suppressing the expansion of the epidemic scale and safeguarding the stability of the economic system, thereby providing crucial reference perspectives for the formulation of global public health security strategies.
Conceptual Logical Foundations of Artificial Social Intelligence
What makes a society possible at all? How is coordination and cooperation in social activity possible? What is the minimal mental architecture of a social agent? How is the information about the state of the world related to the agents intentions? How are the intentions of agents related? What role does communication play in this coordination process? This essay explores the conceptual and logical foundations of artificial social intelligence in the context of a society of multiple agents that communicate and cooperate to achieve some end. An attempt is made to provide an introduction to some of the key concepts, their formal definitions and their interrelationships. These include the notion of a changing social world of multiple agents. The logic of social intelligence goes beyond classical logic by linking information with strategic thought. A minimal architecture of social agents is presented. The agents have different dynamically changing, possible choices and abilities. The agents also have uncertainty, lacking perfect information about their physical state as well as their dynamic social state. The social state of an agent includes the intentional state of that agent, as well as, that agent's representation of the intentional states of other agents. Furthermore, it includes the evaluations agents make of their physical and social condition. Communication, semantic and pragmatic meaning and their relationship to intention and information states are investigated. The logic of agent abilities and intentions are motivated and formalized. The entropy of group strategic states is defined.
Multi-Robot Motion Planning with Diffusion Models ICLR 2025
Diffusion models have recently been successfully applied to a wide range of robotics applications for learning complex multi-modal behaviors from data. However, prior works have mostly been confined to single-robot and small-scale environments due to the high sample complexity of learning multi-robot diffusion models. In this paper, we propose a method for generating collision-free multi-robot trajectories that conform to underlying data distributions while using only single-robot data. Our algorithm, Multi-robot Multi-model planning Diffusion (MMD), does so by combining learned diffusion models with classical search-based techniques -- generating data-driven motions under collision constraints. Scaling further, we show how to compose multiple diffusion models to plan in large environments where a single diffusion model fails to generalize well. We demonstrate the effectiveness of our approach in planning for dozens of robots in a variety of simulated scenarios motivated by logistics environments. View video demonstrations and code at: https://multi-robot-diffusion.github.io/.
comment: The first three authors contributed equally to this work. Published at ICLR 2025
Simplification of Robotic System Model Analysis by Petri Net Meta-Model Property Transfer
This paper presents a simplification of robotic system model analysis due to the transfer of Robotic System Hierarchical Petri Net (RSHPN) meta-model properties onto the model of a designed system. Key contributions include: 1) analysis of RSHPN meta-model properties; 2) decomposition of RSHPN analysis into analysis of individual Petri nets, thus the reduction of state space explosion; and 3) transfer of RSHPN meta-model properties onto the produced models, hence elimination of the need for full re-analysis of the RSHPN model when creating new robotic systems. Only task-dependent parts of the model need to be analysed. This approach streamlines the analysis thus reducing the design time. Moreover, it produces a specification which is a solid foundation for the implementation of the system. The obtained results highlight the potential of Petri nets as a valuable formal framework for analysing robotic system properties.
comment: 16 pages
Robotics
AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control
Humanoid robots derive much of their dexterity from hyper-dexterous whole-body movements, enabling tasks that require a large operational workspace: such as picking objects off the ground. However, achieving these capabilities on real humanoids remains challenging due to their high degrees of freedom (DoF) and nonlinear dynamics. We propose Adaptive Motion Optimization (AMO), a framework that integrates sim-to-real reinforcement learning (RL) with trajectory optimization for real-time, adaptive whole-body control. To mitigate distribution bias in motion imitation RL, we construct a hybrid AMO dataset and train a network capable of robust, on-demand adaptation to potentially O.O.D. commands. We validate AMO in simulation and on a 29-DoF Unitree G1 humanoid robot, demonstrating superior stability and an expanded workspace compared to strong baselines. Finally, we show that AMO's consistent performance supports autonomous task execution via imitation learning, underscoring the system's versatility and robustness.
comment: website: https://amo-humanoid.github.io
Visual Imitation Enables Contextual Humanoid Control
How can we teach humanoids to climb staircases and sit on chairs using the surrounding environment context? Arguably, the simplest way is to just show them-casually capture a human motion video and feed it to humanoids. We introduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everyday videos, jointly reconstructs the humans and the environment, and produces whole-body control policies for humanoid robots that perform the corresponding skills. We demonstrate the results of our pipeline on real humanoid robots, showing robust, repeatable contextual control such as staircase ascents and descents, sitting and standing from chairs and benches, as well as other dynamic whole-body skills-all from a single policy, conditioned on the environment and global root commands. VIDEOMIMIC offers a scalable path towards teaching humanoids to operate in diverse real-world environments.
comment: Project website: https://www.videomimic.net/
PyRoki: A Modular Toolkit for Robot Kinematic Optimization
Robot motion can have many goals. Depending on the task, we might optimize for pose error, speed, collision, or similarity to a human demonstration. Motivated by this, we present PyRoki: a modular, extensible, and cross-platform toolkit for solving kinematic optimization problems. PyRoki couples an interface for specifying kinematic variables and costs with an efficient nonlinear least squares optimizer. Unlike existing tools, it is also cross-platform: optimization runs natively on CPU, GPU, and TPU. In this paper, we present (i) the design and implementation of PyRoki, (ii) motion retargeting and planning case studies that highlight the advantages of PyRoki's modularity, and (iii) optimization benchmarking, where PyRoki can be 1.4-1.7x faster and converges to lower errors than cuRobo, an existing GPU-accelerated inverse kinematics library.
comment: First two authors contributed equally. Code is available at https://pyroki-toolkit.github.io
Meta-Optimization and Program Search using Language Models for Task and Motion Planning
Intelligent interaction with the real world requires robotic agents to jointly reason over high-level plans and low-level controls. Task and motion planning (TAMP) addresses this by combining symbolic planning and continuous trajectory generation. Recently, foundation model approaches to TAMP have presented impressive results, including fast planning times and the execution of natural language instructions. Yet, the optimal interface between high-level planning and low-level motion generation remains an open question: prior approaches are limited by either too much abstraction (e.g., chaining simplified skill primitives) or a lack thereof (e.g., direct joint angle prediction). Our method introduces a novel technique employing a form of meta-optimization to address these issues by: (i) using program search over trajectory optimization problems as an interface between a foundation model and robot control, and (ii) leveraging a zero-order method to optimize numerical parameters in the foundation model output. Results on challenging object manipulation and drawing tasks confirm that our proposed method improves over prior TAMP approaches.
comment: 20 pages, 8 figures, under review for the 9th Annual Conference on Robot Learning (CoRL 2025)
Self-Supervised Learning for Robotic Leaf Manipulation: A Hybrid Geometric-Neural Approach
Automating leaf manipulation in agricultural settings faces significant challenges, including the variability of plant morphologies and deformable leaves. We propose a novel hybrid geometric-neural approach for autonomous leaf grasping that combines traditional computer vision with neural networks through self-supervised learning. Our method integrates YOLOv8 for instance segmentation and RAFT-Stereo for 3D depth estimation to build rich leaf representations, which feed into both a geometric feature scoring pipeline and a neural refinement module (GraspPointCNN). The key innovation is our confidence-weighted fusion mechanism that dynamically balances the contribution of each approach based on prediction certainty. Our self-supervised framework uses the geometric pipeline as an expert teacher to automatically generate training data. Experiments demonstrate that our approach achieves an 88.0% success rate in controlled environments and 84.7% in real greenhouse conditions, significantly outperforming both purely geometric (75.3%) and neural (60.2%) methods. This work establishes a new paradigm for agricultural robotics where domain expertise is seamlessly integrated with machine learning capabilities, providing a foundation for fully automated crop monitoring systems.
comment: 13 pages, 9 figures
Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving
Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be pre-computed offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning strategy for autonomous driving that ensures smooth and safe navigation around obstacles. Modeling the vehicles as safety-augmented bounding boxes and pedestrians as convex hulls in the Frenet space, our approach defines a drivable corridor by determining the appropriate deviation side for static obstacles. Thereafter, a modified space-domain bicycle kinematics model enables path optimization for smoothness, boundary clearance, and dynamic obstacle risk minimization. The optimized path is then passed to a speed planner to generate the final trajectory. We validate FCP through extensive simulations and real-world hardware experiments, demonstrating its efficiency and effectiveness.
comment: 8 pages, 10 figures - Presented at 2025 IEEE 36th Intelligent Vehicles Symposium (IV)
Demonstrating ViSafe: Vision-enabled Safety for High-speed Detect and Avoid
Assured safe-separation is essential for achieving seamless high-density operation of airborne vehicles in a shared airspace. To equip resource-constrained aerial systems with this safety-critical capability, we present ViSafe, a high-speed vision-only airborne collision avoidance system. ViSafe offers a full-stack solution to the Detect and Avoid (DAA) problem by tightly integrating a learning-based edge-AI framework with a custom multi-camera hardware prototype designed under SWaP-C constraints. By leveraging perceptual input-focused control barrier functions (CBF) to design, encode, and enforce safety thresholds, ViSafe can provide provably safe runtime guarantees for self-separation in high-speed aerial operations. We evaluate ViSafe's performance through an extensive test campaign involving both simulated digital twins and real-world flight scenarios. By independently varying agent types, closure rates, interaction geometries, and environmental conditions (e.g., weather and lighting), we demonstrate that ViSafe consistently ensures self-separation across diverse scenarios. In first-of-its-kind real-world high-speed collision avoidance tests with closure rates reaching 144 km/h, ViSafe sets a new benchmark for vision-only autonomous collision avoidance, establishing a new standard for safety in high-speed aerial navigation.
comment: 13 pages, RSS 2025 Demo track
Matching Distance and Geometric Distribution Aided Learning Multiview Point Cloud Registration
Multiview point cloud registration plays a crucial role in robotics, automation, and computer vision fields. This paper concentrates on pose graph construction and motion synchronization within multiview registration. Previous methods for pose graph construction often pruned fully connected graphs or constructed sparse graph using global feature aggregated from local descriptors, which may not consistently yield reliable results. To identify dependable pairs for pose graph construction, we design a network model that extracts information from the matching distance between point cloud pairs. For motion synchronization, we propose another neural network model to calculate the absolute pose in a data-driven manner, rather than optimizing inaccurate handcrafted loss functions. Our model takes into account geometric distribution information and employs a modified attention mechanism to facilitate flexible and reliable feature interaction. Experimental results on diverse indoor and outdoor datasets confirm the effectiveness and generalizability of our approach. The source code is available at https://github.com/Shi-Qi-Li/MDGD.
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
The dawn of embodied intelligence has ushered in an unprecedented imperative for resilient, cognition-enabled multi-agent collaboration across next-generation ecosystems, revolutionizing paradigms in autonomous manufacturing, adaptive service robotics, and cyber-physical production architectures. However, current robotic systems face significant limitations, such as limited cross-embodiment adaptability, inefficient task scheduling, and insufficient dynamic error correction. While End-to-end VLA models demonstrate inadequate long-horizon planning and task generalization, hierarchical VLA models suffer from a lack of cross-embodiment and multi-agent coordination capabilities. To address these challenges, we introduce RoboOS, the first open-source embodied system built on a Brain-Cerebellum hierarchical architecture, enabling a paradigm shift from single-agent to multi-agent intelligence. Specifically, RoboOS consists of three key components: (1) Embodied Brain Model (RoboBrain), a MLLM designed for global perception and high-level decision-making; (2) Cerebellum Skill Library, a modular, plug-and-play toolkit that facilitates seamless execution of multiple skills; and (3) Real-Time Shared Memory, a spatiotemporal synchronization mechanism for coordinating multi-agent states. By integrating hierarchical information flow, RoboOS bridges Embodied Brain and Cerebellum Skill Library, facilitating robust planning, scheduling, and error correction for long-horizon tasks, while ensuring efficient multi-agent collaboration through Real-Time Shared Memory. Furthermore, we enhance edge-cloud communication and cloud-based distributed inference to facilitate high-frequency interactions and enable scalable deployment. Extensive real-world experiments across various scenarios, demonstrate RoboOS's versatility in supporting heterogeneous embodiments. Project website: https://github.com/FlagOpen/RoboOS
comment: 22 pages, 10 figures
Meta-reasoning Using Attention Maps and Its Applications in Cloud Robotics
Metareasoning, a branch of AI, focuses on reasoning about reasons. It has the potential to enhance robots' decision-making processes in unexpected situations. However, the concept has largely been confined to theoretical discussions and case-by-case investigations, lacking general and practical solutions when the Value of Computation (VoC) is undefined, which is common in unexpected situations. In this work, we propose a revised meta-reasoning framework that significantly improves the scalability of the original approach in unexpected situations. This is achieved by incorporating semantic attention maps and unsupervised 'attention' updates into the metareasoning processes. To accommodate environmental dynamics, 'lines of thought' are used to bridge context-specific objects with abstracted attentions, while meta-information is monitored and controlled at the meta-level for effective reasoning. The practicality of the proposed approach is demonstrated through cloud robots deployed in real-world scenarios, showing improved performance and robustness.
Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions
Despite significant progress in autonomous navigation, a critical gap remains in ensuring reliable localization in hazardous environments such as tunnels, urban disaster zones, and underground structures. Tunnels present a uniquely difficult scenario: they are not only prone to GNSS signal loss, but also provide little features for visual localization due to their repetitive walls and poor lighting. These conditions degrade conventional vision-based and LiDAR-based systems, which rely on distinguishable environmental features. To address this, we propose a novel sensor fusion framework that integrates a thermal camera with a LiDAR to enable robust localization in tunnels and other perceptually degraded environments. The thermal camera provides resilience in low-light or smoke conditions, while the LiDAR delivers precise depth perception and structural awareness. By combining these sensors, our framework ensures continuous and accurate localization across diverse and dynamic environments. We use an Extended Kalman Filter (EKF) to fuse multi-sensor inputs, and leverages visual odometry and SLAM (Simultaneous Localization and Mapping) techniques to process the sensor data, enabling robust motion estimation and mapping even in GNSS-denied environments. This fusion of sensor modalities not only enhances system resilience but also provides a scalable solution for cyber-physical systems in connected and autonomous vehicles (CAVs). To validate the framework, we conduct tests in a tunnel environment, simulating sensor degradation and visibility challenges. The results demonstrate that our method sustains accurate localization where standard approaches deteriorate due to the tunnels featureless geometry. The frameworks versatility makes it a promising solution for autonomous vehicles, inspection robots, and other cyber-physical systems operating in constrained, perceptually poor environments.
comment: Submitted to IAVVC 2025
Panoramic Out-of-Distribution Segmentation
Panoramic imaging enables capturing 360{\deg} images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to background clutter and pixel distortions. To address these issues, we introduce a new task, Panoramic Out-of-distribution Segmentation (PanOoS), achieving OoS for panoramas. Furthermore, we propose the first solution, POS, which adapts to the characteristics of panoramic images through text-guided prompt distribution learning. Specifically, POS integrates a disentanglement strategy designed to materialize the cross-domain generalization capability of CLIP. The proposed Prompt-based Restoration Attention (PRA) optimizes semantic decoding by prompt guidance and self-adaptive correction, while Bilevel Prompt Distribution Learning (BPDL) refines the manifold of per-pixel mask embeddings via semantic prototype supervision. Besides, to compensate for the scarcity of PanOoS datasets, we establish two benchmarks: DenseOoS, which features diverse outliers in complex environments, and QuadOoS, captured by a quadruped robot with a panoramic annular lens system. Extensive experiments demonstrate superior performance of POS, with AuPRC improving by 34.25% and FPR95 decreasing by 21.42% on DenseOoS, outperforming state-of-the-art pinhole-OoS methods. Moreover, POS achieves leading closed-set segmentation capabilities. Code and datasets will be available at https://github.com/MengfeiD/PanOoS.
comment: Code and datasets will be available at https://github.com/MengfeiD/PanOoS
Automated Action Generation based on Action Field for Robotic Garment Manipulation
Garment manipulation using robotic systems is a challenging task due to the diverse shapes and deformable nature of fabric. In this paper, we propose a novel method for robotic garment manipulation that significantly improves the accuracy while reducing computational time compared to previous approaches. Our method features an action generator that directly interprets scene images and generates pixel-wise end-effector action vectors using a neural network. The network also predicts a manipulation score map that ranks potential actions, allowing the system to select the most effective action. Extensive simulation experiments demonstrate that our method achieves higher unfolding and alignment performances and faster computation time than previous approaches. Real-world experiments show that the proposed method generalizes well to different garment types and successfully flattens garments.
Artificial Protozoa Optimizer (APO): A novel bio-inspired metaheuristic algorithm for engineering optimization
This study proposes a novel artificial protozoa optimizer (APO) that is inspired by protozoa in nature. The APO mimics the survival mechanisms of protozoa by simulating their foraging, dormancy, and reproductive behaviors. The APO was mathematically modeled and implemented to perform the optimization processes of metaheuristic algorithms. The performance of the APO was verified via experimental simulations and compared with 32 state-of-the-art algorithms. Wilcoxon signed-rank test was performed for pairwise comparisons of the proposed APO with the state-of-the-art algorithms, and Friedman test was used for multiple comparisons. First, the APO was tested using 12 functions of the 2022 IEEE Congress on Evolutionary Computation benchmark. Considering practicality, the proposed APO was used to solve five popular engineering design problems in a continuous space with constraints. Moreover, the APO was applied to solve a multilevel image segmentation task in a discrete space with constraints. The experiments confirmed that the APO could provide highly competitive results for optimization problems. The source codes of Artificial Protozoa Optimizer are publicly available at https://seyedalimirjalili.com/projects and https://ww2.mathworks.cn/matlabcentral/fileexchange/162656-artificial-protozoa-optimizer.
Task Reconstruction and Extrapolation for $π_0$ using Text Latent
Vision-language-action models (VLAs) often achieve high performance on demonstrated tasks but struggle significantly when required to extrapolate, combining skills learned from different tasks in novel ways. For instance, VLAs might successfully put the cream cheese in the bowl and put the bowl on top of the cabinet, yet still fail to put the cream cheese on top of the cabinet. In this work, we demonstrate that behaviors from distinct tasks can be effectively recombined by manipulating the VLA's internal representations at inference time. Concretely, we identify the text latent by averaging the text tokens' hidden states across all demonstrated trajectories for a specific base task. For executing an extrapolated task, we can temporally interpolate the text latent of the two base tasks and add it back to the text hidden states, so sub-behaviors from the two tasks will be activated sequentially. We evaluate this approach using the newly created libero-ood benchmark, featuring 20 tasks extrapolated from standard LIBERO suites. The results on libero-ood show that all SOTA VLAs achieve < 15% success rate, while $\pi0$ with text latent interpolation reaches an 83% success rate. Further qualitative analysis reveals a tendency for VLAs to exhibit spatial overfitting, mapping object names to demonstrated locations rather than achieving genuine object and goal understanding. Additionally, we find that decoding the text latent yields human-unreadable prompts that can nevertheless instruct the VLA to achieve a 70% success rate on standard LIBERO suites, enabling private instruction or backdoor attacks.
LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs
The growing demand for intelligent logistics, particularly fine-grained terminal delivery, underscores the need for autonomous UAV (Unmanned Aerial Vehicle)-based delivery systems. However, most existing last-mile delivery studies rely on ground robots, while current UAV-based Vision-Language Navigation (VLN) tasks primarily focus on coarse-grained, long-range goals, making them unsuitable for precise terminal delivery. To bridge this gap, we propose LogisticsVLN, a scalable aerial delivery system built on multimodal large language models (MLLMs) for autonomous terminal delivery. LogisticsVLN integrates lightweight Large Language Models (LLMs) and Visual-Language Models (VLMs) in a modular pipeline for request understanding, floor localization, object detection, and action-decision making. To support research and evaluation in this new setting, we construct the Vision-Language Delivery (VLD) dataset within the CARLA simulator. Experimental results on the VLD dataset showcase the feasibility of the LogisticsVLN system. In addition, we conduct subtask-level evaluations of each module of our system, offering valuable insights for improving the robustness and real-world deployment of foundation model-based vision-language delivery systems.
AquaticVision: Benchmarking Visual SLAM in Underwater Environment with Events and Frames
Many underwater applications, such as offshore asset inspections, rely on visual inspection and detailed 3D reconstruction. Recent advancements in underwater visual SLAM systems for aquatic environments have garnered significant attention in marine robotics research. However, existing underwater visual SLAM datasets often lack groundtruth trajectory data, making it difficult to objectively compare the performance of different SLAM algorithms based solely on qualitative results or COLMAP reconstruction. In this paper, we present a novel underwater dataset that includes ground truth trajectory data obtained using a motion capture system. Additionally, for the first time, we release visual data that includes both events and frames for benchmarking underwater visual positioning. By providing event camera data, we aim to facilitate the development of more robust and advanced underwater visual SLAM algorithms. The use of event cameras can help mitigate challenges posed by extremely low light or hazy underwater conditions. The webpage of our dataset is https://sites.google.com/view/aquaticvision-lias.
LiftFeat: 3D Geometry-Aware Local Feature Matching ICRA 2025
Robust and efficient local feature matching plays a crucial role in applications such as SLAM and visual localization for robotics. Despite great progress, it is still very challenging to extract robust and discriminative visual features in scenarios with drastic lighting changes, low texture areas, or repetitive patterns. In this paper, we propose a new lightweight network called \textit{LiftFeat}, which lifts the robustness of raw descriptor by aggregating 3D geometric feature. Specifically, we first adopt a pre-trained monocular depth estimation model to generate pseudo surface normal label, supervising the extraction of 3D geometric feature in terms of predicted surface normal. We then design a 3D geometry-aware feature lifting module to fuse surface normal feature with raw 2D descriptor feature. Integrating such 3D geometric feature enhances the discriminative ability of 2D feature description in extreme conditions. Extensive experimental results on relative pose estimation, homography estimation, and visual localization tasks, demonstrate that our LiftFeat outperforms some lightweight state-of-the-art methods. Code will be released at : https://github.com/lyp-deeplearning/LiftFeat.
comment: Accepted at ICRA 2025
Close-Fitting Dressing Assistance Based on State Estimation of Feet and Garments with Semantic-based Visual Attention
As the population continues to age, a shortage of caregivers is expected in the future. Dressing assistance, in particular, is crucial for opportunities for social participation. Especially dressing close-fitting garments, such as socks, remains challenging due to the need for fine force adjustments to handle the friction or snagging against the skin, while considering the shape and position of the garment. This study introduces a method uses multi-modal information including not only robot's camera images, joint angles, joint torques, but also tactile forces for proper force interaction that can adapt to individual differences in humans. Furthermore, by introducing semantic information based on object concepts, rather than relying solely on RGB data, it can be generalized to unseen feet and background. In addition, incorporating depth data helps infer relative spatial relationship between the sock and the foot. To validate its capability for semantic object conceptualization and to ensure safety, training data were collected using a mannequin, and subsequent experiments were conducted with human subjects. In experiments, the robot successfully adapted to previously unseen human feet and was able to put socks on 10 participants, achieving a higher success rate than Action Chunking with Transformer and Diffusion Policy. These results demonstrate that the proposed model can estimate the state of both the garment and the foot, enabling precise dressing assistance for close-fitting garments.
Effective Reinforcement Learning Control using Conservative Soft Actor-Critic
Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning stability, and sample efficiency remains a significant challenge. Traditional methods such as Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) address these issues by incorporating entropy or relative entropy regularization, but often face problems of instability and low sample efficiency. In this paper, we propose the Conservative Soft Actor-Critic (CSAC) algorithm, which seamlessly integrates entropy and relative entropy regularization within the AC framework. CSAC improves exploration through entropy regularization while avoiding overly aggressive policy updates with the use of relative entropy regularization. Evaluations on benchmark tasks and real-world robotic simulations demonstrate that CSAC offers significant improvements in stability and efficiency over existing methods. These findings suggest that CSAC provides strong robustness and application potential in control tasks under dynamic environments.
comment: 14 pages, 9 figures
RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation
Achieving both realism and controllability in interactive closed-loop traffic simulation remains a key challenge in autonomous driving. Data-driven simulation methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and controllable closed-loop interactions but often lack expert demonstrations, compromising realism. To address these challenges, we introduce a dual-stage AV-centered simulation framework that conducts open-loop imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and multimodality, followed by closed-loop reinforcement learning fine-tuning in a physics-based simulator to enhance controllability and mitigate covariate shift. In the fine-tuning stage, we propose RIFT, a simple yet effective closed-loop RL fine-tuning strategy that preserves the trajectory-level multimodality through a GRPO-style group-relative advantage formulation, while enhancing controllability and training stability by replacing KL regularization with the dual-clip mechanism. Extensive experiments demonstrate that RIFT significantly improves the realism and controllability of generated traffic scenarios, providing a robust platform for evaluating autonomous vehicle performance in diverse and interactive scenarios.
Miniature multihole airflow sensor for lightweight aircraft over wide speed and angular range
An aircraft's airspeed, angle of attack, and angle of side slip are crucial to its safety, especially when flying close to the stall regime. Various solutions exist, including pitot tubes, angular vanes, and multihole pressure probes. However, current sensors are either too heavy (>30 g) or require large airspeeds (>20 m/s), making them unsuitable for small uncrewed aerial vehicles. We propose a novel multihole pressure probe, integrating sensing electronics in a single-component structure, resulting in a mechanically robust and lightweight sensor (9 g), which we released to the public domain. Since there is no consensus on two critical design parameters, tip shape (conical vs spherical) and hole spacing (distance between holes), we provide a study on measurement accuracy and noise generation using wind tunnel experiments. The sensor is calibrated using a multivariate polynomial regression model over an airspeed range of 3-27 m/s and an angle of attack/sideslip range of +-35{\deg}, achieving a mean absolute error of 0.44 m/s and 0.16{\deg}. Finally, we validated the sensor in outdoor flights near the stall regime. Our probe enabled accurate estimations of airspeed, angle of attack and sideslip during different acrobatic manoeuvres. Due to its size and weight, this sensor will enable safe flight for lightweight, uncrewed aerial vehicles flying at low speeds close to the stall regime.
The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning
We present Mixture of Discrete-time Gaussian Processes (MiDiGap), a novel approach for flexible policy representation and imitation learning in robot manipulation. MiDiGap enables learning from as few as five demonstrations using only camera observations and generalizes across a wide range of challenging tasks. It excels at long-horizon behaviors such as making coffee, highly constrained motions such as opening doors, dynamic actions such as scooping with a spatula, and multimodal tasks such as hanging a mug. MiDiGap learns these tasks on a CPU in less than a minute and scales linearly to large datasets. We also develop a rich suite of tools for inference-time steering using evidence such as collision signals and robot kinematic constraints. This steering enables novel generalization capabilities, including obstacle avoidance and cross-embodiment policy transfer. MiDiGap achieves state-of-the-art performance on diverse few-shot manipulation benchmarks. On constrained RLBench tasks, it improves policy success by 76 percentage points and reduces trajectory cost by 67%. On multimodal tasks, it improves policy success by 48 percentage points and increases sample efficiency by a factor of 20. In cross-embodiment transfer, it more than doubles policy success. We make the code publicly available at https://midigap.cs.uni-freiburg.de.
comment: Submitted for publication to IEEE Transaction on Robotics
Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces
Modern automation systems increasingly rely on modular architectures, with capabilities and skills as one solution approach. Capabilities define the functions of resources in a machine-readable form and skills provide the concrete implementations that realize those capabilities. However, the development of a skill implementation conforming to a corresponding capability remains a time-consuming and challenging task. In this paper, we present a method that treats capabilities as contracts for skill implementations and leverages large language models to generate executable code based on natural language user input. A key feature of our approach is the integration of existing software libraries and interface technologies, enabling the generation of skill implementations across different target languages. We introduce a framework that allows users to incorporate their own libraries and resource interfaces into the code generation process through a retrieval-augmented generation architecture. The proposed method is evaluated using an autonomous mobile robot controlled via Python and ROS 2, demonstrating the feasibility and flexibility of the approach.
OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction
The safe operation of autonomous vehicles (AVs) is highly dependent on their understanding of the surroundings. For this, the task of 3D semantic occupancy prediction divides the space around the sensors into voxels, and labels each voxel with both occupancy and semantic information. Recent perception models have used multisensor fusion to perform this task. However, existing multisensor fusion-based approaches focus mainly on using sensor information in the Cartesian coordinate system. This ignores the distribution of the sensor readings, leading to a loss of fine-grained details and performance degradation. In this paper, we propose OccCylindrical that merges and refines the different modality features under cylindrical coordinates. Our method preserves more fine-grained geometry detail that leads to better performance. Extensive experiments conducted on the nuScenes dataset, including challenging rainy and nighttime scenarios, confirm our approach's effectiveness and state-of-the-art performance. The code will be available at: https://github.com/DanielMing123/OccCylindrical
Enabling Robots to Autonomously Search Dynamic Cluttered Post-Disaster Environments
Robots will bring search and rescue (SaR) in disaster response to another level, in case they can autonomously take over dangerous SaR tasks from humans. A main challenge for autonomous SaR robots is to safely navigate in cluttered environments with uncertainties, while avoiding static and moving obstacles. We propose an integrated control framework for SaR robots in dynamic, uncertain environments, including a computationally efficient heuristic motion planning system that provides a nominal (assuming there are no uncertainties) collision-free trajectory for SaR robots and a robust motion tracking system that steers the robot to track this reference trajectory, taking into account the impact of uncertainties. The control architecture guarantees a balanced trade-off among various SaR objectives, while handling the hard constraints, including safety. The results of various computer-based simulations, presented in this paper, showed significant out-performance (of up to 42.3%) of the proposed integrated control architecture compared to two commonly used state-of-the-art methods (Rapidly-exploring Random Tree and Artificial Potential Function) in reaching targets (e.g., trapped victims in SaR) safely, collision-free, and in the shortest possible time.
Model Predictive Fuzzy Control: A Hierarchical Multi-Agent Control Architecture for Outdoor Search-and-Rescue Robots
Autonomous robots deployed in unknown search-and-rescue (SaR) environments can significantly improve the efficiency of the mission by assisting in fast localisation and rescue of the trapped victims. We propose a novel integrated hierarchical control architecture, called model predictive fuzzy control (MPFC), for autonomous mission planning of multi-robot SaR systems that should efficiently map an unknown environment: We combine model predictive control (MPC) and fuzzy logic control (FLC), where the robots are locally controlled by computationally efficient FLC controllers, and the parameters of these local controllers are tuned via a centralised MPC controller, in a regular or event-triggered manner. The proposed architecture provides three main advantages: (1) The control decisions are made by the FLC controllers, thus the real-time computation time is affordable. (2) The centralised MPC controller optimises the performance criteria with a global and predictive vision of the system dynamics, and updates the parameters of the FLC controllers accordingly. (3) FLC controllers are heuristic by nature and thus do not take into account optimality in their decisions, while the tuned parameters via the MPC controller can indirectly incorporate some level of optimality in local decisions of the robots. A simulation environment for victim detection in a disaster environment was designed in MATLAB using discrete, 2-D grid-based models. While being comparable from the point of computational efficiency, the integrated MPFC architecture improves the performance of the multi-robot SaR system compared to decentralised FLC controllers. Moreover, the performance of MPFC is comparable to the performance of centralised MPC for path planning of SaR robots, whereas MPFC requires significantly less computational resources, since the number of the optimisation variables in the control problem are reduced.
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
Future robotic systems operating in real-world environments will require on-board embodied intelligence without continuous cloud connection, balancing capabilities with constraints on computational power and memory. This work presents an extension of the R1-zero approach, which enables the usage of low parameter-count Large Language Models (LLMs) in the robotic domain. The R1-Zero approach was originally developed to enable mathematical reasoning in LLMs using static datasets. We extend it to the robotics domain through integration in a closed-loop Reinforcement Learning (RL) framework. This extension enhances reasoning in Embodied Artificial Intelligence (Embodied AI) settings without relying solely on distillation of large models through Supervised Fine-Tuning (SFT). We show that small-scale LLMs can achieve effective reasoning performance by learning through closed-loop interaction with their environment, which enables tasks that previously required significantly larger models. In an autonomous driving setting, a performance gain of 20.2%-points over the SFT-based baseline is observed with a Qwen2.5-1.5B model. Using the proposed training procedure, Qwen2.5-3B achieves a 63.3% control adaptability score, surpassing the 58.5% obtained by the much larger, cloud-bound GPT-4o. These results highlight that practical, on-board deployment of small LLMs is not only feasible but can outperform larger models if trained through environmental feedback, underscoring the importance of an interactive learning framework for robotic Embodied AI, one grounded in practical experience rather than static supervision.
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
Embodied foundation models are gaining increasing attention for their zero-shot generalization, scalability, and adaptability to new tasks through few-shot post-training. However, existing models rely heavily on real-world data, which is costly and labor-intensive to collect. Synthetic data offers a cost-effective alternative, yet its potential remains largely underexplored. To bridge this gap, we explore the feasibility of training Vision-Language-Action models entirely with large-scale synthetic action data. We curate SynGrasp-1B, a billion-frame robotic grasping dataset generated in simulation with photorealistic rendering and extensive domain randomization. Building on this, we present GraspVLA, a VLA model pretrained on large-scale synthetic action data as a foundational model for grasping tasks. GraspVLA integrates autoregressive perception tasks and flow-matching-based action generation into a unified Chain-of-Thought process, enabling joint training on synthetic action data and Internet semantics data. This design helps mitigate sim-to-real gaps and facilitates the transfer of learned actions to a broader range of Internet-covered objects, achieving open-vocabulary generalization in grasping. Extensive evaluations across real-world and simulation benchmarks demonstrate GraspVLA's advanced zero-shot generalizability and few-shot adaptability to specific human preferences. We will release SynGrasp-1B dataset and pre-trained weights to benefit the community.
RADE: Learning Risk-Adjustable Driving Environment via Multi-Agent Conditional Diffusion
Generating safety-critical scenarios in high-fidelity simulations offers a promising and cost-effective approach for efficient testing of autonomous vehicles. Existing methods typically rely on manipulating a single vehicle's trajectory through sophisticated designed objectives to induce adversarial interactions, often at the cost of realism and scalability. In this work, we propose the Risk-Adjustable Driving Environment (RADE), a simulation framework that generates statistically realistic and risk-adjustable traffic scenes. Built upon a multi-agent diffusion architecture, RADE jointly models the behavior of all agents in the environment and conditions their trajectories on a surrogate risk measure. Unlike traditional adversarial methods, RADE learns risk-conditioned behaviors directly from data, preserving naturalistic multi-agent interactions with controllable risk levels. To ensure physical plausibility, we incorporate a tokenized dynamics check module that efficiently filters generated trajectories using a motion vocabulary. We validate RADE on the real-world rounD dataset, demonstrating that it preserves statistical realism across varying risk levels and naturally increases the likelihood of safety-critical events as the desired risk level grows up. Our results highlight RADE's potential as a scalable and realistic tool for AV safety evaluation.
Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets
Instruction-Action (IA) data pairs are valuable for training robotic systems, especially autonomous vehicles (AVs), but having humans manually annotate this data is costly and time-inefficient. This paper explores the potential of using mobile application Global Positioning System (GPS) references and Natural Language Processing (NLP) to automatically generate large volumes of IA commands and responses without having a human generate or retroactively tag the data. In our pilot data collection, by driving to various destinations and collecting voice instructions from GPS applications, we demonstrate a means to collect and categorize the diverse sets of instructions, further accompanied by video data to form complete vision-language-action triads. We provide details on our completely automated data collection prototype system, ADVLAT-Engine. We characterize collected GPS voice instructions into eight different classifications, highlighting the breadth of commands and referentialities available for curation from freely available mobile applications. Through research and exploration into the automation of IA data pairs using GPS references, the potential to increase the speed and volume at which high-quality IA datasets are created, while minimizing cost, can pave the way for robust vision-language-action (VLA) models to serve tasks in vision-language navigation (VLN) and human-interactive autonomous systems.
Systematic Evaluation of Initial States and Exploration-Exploitation Strategies in PID Auto-Tuning: A Framework-Driven Approach Applied on Mobile Robots
PID controllers are widely used in control systems because of their simplicity and effectiveness. Although advanced optimization techniques such as Bayesian Optimization and Differential Evolution have been applied to address the challenges of automatic tuning of PID controllers, the influence of initial system states on convergence and the balance between exploration and exploitation remains underexplored. Moreover, experimenting the influence directly on real cyber-physical systems such as mobile robots is crucial for deriving realistic insights. In the present paper, a novel framework is introduced to evaluate the impact of systematically varying these factors on the PID auto-tuning processes that utilize Bayesian Optimization and Differential Evolution. Testing was conducted on two distinct PID-controlled robotic platforms, an omnidirectional robot and a differential drive mobile robot, to assess the effects on convergence rate, settling time, rise time, and overshoot percentage. As a result, the experimental outcomes yield evidence on the effects of the systematic variations, thereby providing an empirical basis for future research studies in the field.
Learn to Swim: Data-Driven LSTM Hydrodynamic Model for Quadruped Robot Gait Optimization ICRA
This paper presents a Long Short-Term Memory network-based Fluid Experiment Data-Driven model (FED-LSTM) for predicting unsteady, nonlinear hydrodynamic forces on the underwater quadruped robot we constructed. Trained on experimental data from leg force and body drag tests conducted in both a recirculating water tank and a towing tank, FED-LSTM outperforms traditional Empirical Formulas (EF) commonly used for flow prediction over flat surfaces. The model demonstrates superior accuracy and adaptability in capturing complex fluid dynamics, particularly in straight-line and turning-gait optimizations via the NSGA-II algorithm. FED-LSTM reduces deflection errors during straight-line swimming and improves turn times without increasing the turning radius. Hardware experiments further validate the model's precision and stability over EF. This approach provides a robust framework for enhancing the swimming performance of legged robots, laying the groundwork for future advances in underwater robotic locomotion.
comment: This work has been accepted for publication in the IEEE International Conference on Robotics and Automation (ICRA) 2025. The final version will be available in IEEE Xplore (DOI to be assigned upon publication)
HCOA*: Hierarchical Class-ordered A* for Navigation in Semantic Environments
This paper addresses the problem of robot navigation in mixed geometric and semantic 3D environments. Given a hierarchical representation of the environment, the objective is to navigate from a start position to a goal while minimizing the computational cost. We introduce Hierarchical Class-ordered A* (HCOA*), an algorithm that leverages the environmental hierarchy for efficient path-planning in semantic graphs, significantly reducing computational effort. We use a total order over the semantic classes and prove theoretical performance guarantees for the algorithm. We propose two approaches for higher-layer node classification based on the node semantics of the lowest layer: a Graph Neural Network-based method and a Majority-Class method. We evaluate our approach through simulations on a 3D Scene Graph (3DSG), comparing it to the state-of-the-art and assessing its performance against our classification approaches. Results show that HCOA* can find the optimal path while reducing the number of expanded nodes by 25% and achieving a 16% reduction in computational time on the uHumans2 3DSG dataset.
comment: 6 pages, 4 figures
Global Task-aware Fault Detection, Identification For On-Orbit Multi-Spacecraft Collaborative Inspection
In this paper, we present a global-to-local task-aware fault detection and identification algorithm to detect failures in a multi-spacecraft system performing a collaborative inspection (referred to as global) task. The inspection task is encoded as a cost functional $\costH$ that informs global (task allocation and assignment) and local (agent-level) decision-making. The metric $\costH$ is a function of the inspection sensor model, and the agent full-pose. We use the cost functional $\costH$ to design a metric that compares the expected and actual performance to detect the faulty agent using a threshold. We use higher-order cost gradients $\costH$ to derive a new metric to identify the type of fault, including task-specific sensor fault, an agent-level actuator, and sensor faults. Furthermore, we propose an approach to design adaptive thresholds for each fault mentioned above to incorporate the time dependence of the inspection task. We demonstrate the efficacy of the proposed method empirically, by simulating and detecting faults (such as inspection sensor faults, actuators, and sensor faults) in a low-Earth orbit collaborative spacecraft inspection task using the metrics and the threshold designed using the global task cost $\costH$.
comment: published. 33rd AAS AIAA Conference 2023
Fabrication and Characterization of Additively Manufactured Stretchable Strain Sensors Towards the Shape Sensing of Continuum Robots
This letter describes the manufacturing and experimental characterization of novel stretchable strain sensors for continuum robots. The overarching goal of this research is to provide a new solution for the shape sensing of these devices. The sensors are fabricated via direct ink writing, an extrusion-based additive manufacturing technique. Electrically conductive material (i.e., the \textit{ink}) is printed into traces whose electrical resistance varies in response to mechanical deformation. The principle of operation of stretchable strain sensors is analogous to that of conventional strain gauges, but with a significantly larger operational window thanks to their ability to withstand larger strain. Among the different conductive materials considered for this study, we opted to fabricate the sensors with a high-viscosity eutectic Gallium-Indium ink, which in initial testing exhibited high linearity ($R^2 \approx$ 0.99), gauge factor $\approx$ 1, and negligible drift. Benefits of the proposed sensors include (i) ease of fabrication, as they can be conveniently printed in a matter of minutes; (ii) ease of installation, as they can simply be glued to the outside body of a robot; (iii) ease of miniaturization, which enables integration into millimiter-sized continuum robots.
comment: Author's manuscript. Accepted for publication in IEEE Robotics and Automation Letters
Latent Adaptive Planner for Dynamic Manipulation
This paper presents Latent Adaptive Planner (LAP), a novel approach for dynamic nonprehensile manipulation tasks that formulates planning as latent space inference, effectively learned from human demonstration videos. Our method addresses key challenges in visuomotor policy learning through a principled variational replanning framework that maintains temporal consistency while efficiently adapting to environmental changes. LAP employs Bayesian updating in latent space to incrementally refine plans as new observations become available, striking an optimal balance between computational efficiency and real-time adaptability. We bridge the embodiment gap between humans and robots through model-based proportional mapping that regenerates accurate kinematic-dynamic joint states and object positions from human demonstrations. Experimental evaluations across multiple complex manipulation benchmarks demonstrate that LAP achieves state-of-the-art performance, outperforming existing approaches in success rate, trajectory smoothness, and energy efficiency, particularly in dynamic adaptation scenarios. Our approach enables robots to perform complex interactions with human-like adaptability while providing an expandable framework applicable to diverse robotic platforms using the same human demonstration videos.
PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers SIGGRAPH
Humans excel in navigating diverse, complex environments with agile motor skills, exemplified by parkour practitioners performing dynamic maneuvers, such as climbing up walls and jumping across gaps. Reproducing these agile movements with simulated characters remains challenging, in part due to the scarcity of motion capture data for agile terrain traversal behaviors and the high cost of acquiring such data. In this work, we introduce PARC (Physics-based Augmentation with Reinforcement Learning for Character Controllers), a framework that leverages machine learning and physics-based simulation to iteratively augment motion datasets and expand the capabilities of terrain traversal controllers. PARC begins by training a motion generator on a small dataset consisting of core terrain traversal skills. The motion generator is then used to produce synthetic data for traversing new terrains. However, these generated motions often exhibit artifacts, such as incorrect contacts or discontinuities. To correct these artifacts, we train a physics-based tracking controller to imitate the motions in simulation. The corrected motions are then added to the dataset, which is used to continue training the motion generator in the next iteration. PARC's iterative process jointly expands the capabilities of the motion generator and tracker, creating agile and versatile models for interacting with complex environments. PARC provides an effective approach to develop controllers for agile terrain traversal, which bridges the gap between the scarcity of motion data and the need for versatile character controllers.
comment: SIGGRAPH Conference Papers 2025
NMPC-Lander: Nonlinear MPC with Barrier Function for UAV Landing on a Mobile Platform
Quadcopters are versatile aerial robots gaining popularity in numerous critical applications. However, their operational effectiveness is constrained by limited battery life and restricted flight range. To address these challenges, autonomous drone landing on stationary or mobile charging and battery-swapping stations has become an essential capability. In this study, we present NMPC-Lander, a novel control architecture that integrates Nonlinear Model Predictive Control (NMPC) with Control Barrier Functions (CBF) to achieve precise and safe autonomous landing on both static and dynamic platforms. Our approach employs NMPC for accurate trajectory tracking and landing, while simultaneously incorporating CBF to ensure collision avoidance with static obstacles. Experimental evaluations on the real hardware demonstrate high precision in landing scenarios, with an average final position error of 9.0 cm and 11 cm for stationary and mobile platforms, respectively. Notably, NMPC-Lander outperforms the B-spline combined with the A* planning method by nearly threefold in terms of position tracking, underscoring its superior robustness and practical effectiveness.
comment: This manuscript has been submitted to the IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2025
MIHRaGe: A Mixed-Reality Interface for Human-Robot Interaction via Gaze-Oriented Control
Individuals with upper limb mobility impairments often require assistive technologies to perform activities of daily living. While gaze-tracking has emerged as a promising method for robotic assistance, existing solutions lack sufficient feedback mechanisms, leading to uncertainty in user intent recognition and reduced adaptability. This paper presents the MIHRAGe interface, an integrated system that combines gaze-tracking, robotic assistance, and a mixed-reality to create an immersive environment for controlling the robot using only eye movements. The system was evaluated through an experimental protocol involving four participants, assessing gaze accuracy, robotic positioning precision, and the overall success of a pick and place task. Results showed an average gaze fixation error of 1.46 cm, with individual variations ranging from 1.28 cm to 2.14 cm. The robotic arm demonstrated an average positioning error of +-1.53 cm, with discrepancies attributed to interface resolution and calibration constraints. In a pick and place task, the system achieved a success rate of 80%, highlighting its potential for improving accessibility in human-robot interaction with visual feedback to the user.
Omnidirectional vision sensors based on catadioptric systems with discrete infrared photoreceptors for swarm robotics
In this work, we fabricated and studied two designs for omnidirectional vision sensors for swarm robotics, based on catadioptric systems consisting of a mirror with rotational symmetry, eight discrete infrared photodiodes and a single LED, in order to provide localization and navigation abilities for mobile robotic agents. We considered two arrangements for the photodiodes: one in which they point upward into the mirror, and one in which they point outward, perpendicular to the mirror. To determine which design offers a better field of view on the plane, as well as detection of distance and orientation between two agents, we developed a test rail with three degrees of freedom to experimentally and systematically measure the signal registered by the photodiodes of a given sensor (in a single readout) from the light emitted by another as functions of the distance and orientation. Afterwards, we processed and analyzed the experimental data to develop mathematical models for the mean response of a photodiode in each design. Finally, by numerically inverting the models, we compared the two designs in terms of their accuracy. Our results show that the design with the photodiodes pointing upward resolves better the distance, while the other resolves better the orientation of the emitting agent, both providing an omnidirectional field of view.
comment: 24 pages, 12 figures
Improving Failure Prediction in Aircraft Fastener Assembly Using Synthetic Data in Imbalanced Datasets
Automating aircraft manufacturing still relies heavily on human labor due to the complexity of the assembly processes and customization requirements. One key challenge is achieving precise positioning, especially for large aircraft structures, where errors can lead to substantial maintenance costs or part rejection. Existing solutions often require costly hardware or lack flexibility. Used in aircraft by the thousands, threaded fasteners, e.g., screws, bolts, and collars, are traditionally executed by fixed-base robots and usually have problems in being deployed in the mentioned manufacturing sites. This paper emphasizes the importance of error detection and classification for efficient and safe assembly of threaded fasteners, especially aeronautical collars. Safe assembly of threaded fasteners is paramount since acquiring sufficient data for training deep learning models poses challenges due to the rarity of failure cases and imbalanced datasets. The paper addresses this by proposing techniques like class weighting and data augmentation, specifically tailored for temporal series data, to improve classification performance. Furthermore, the paper introduces a novel problem-modeling approach, emphasizing metrics relevant to collar assembly rather than solely focusing on accuracy. This tailored approach enhances the models' capability to handle the challenges of threaded fastener assembly effectively.
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem, this paper will summarize and compare the structural designs of existing dual-system architectures, and conduct systematic empirical evaluations on the core design elements of existing dual-system architectures. Ultimately, it will provide a low-cost open-source model for further exploration. Of course, this project will continue to update with more experimental conclusions and open-source models with improved performance for everyone to choose from. Project page: https://openhelix-robot.github.io/.
Constraint Selection in Optimization-Based Controllers
Human-machine collaboration often involves constrained optimization problems for decision-making processes. However, when the machine is a dynamical system with a continuously evolving state, infeasibility due to multiple conflicting constraints can lead to dangerous outcomes. In this work, we propose a heuristic-based method that resolves infeasibility at every time step by selectively disregarding a subset of soft constraints based on the past values of the Lagrange multipliers. Compared to existing approaches, our method requires the solution of a smaller optimization problem to determine feasibility, resulting in significantly faster computation. Through a series of simulations, we demonstrate that our algorithm achieves performance comparable to state-of-the-art methods while offering improved computational efficiency.
comment: Submitted to IEEE Control Systems Letters (L-CSS)
Spatio-Temporal Metric-Semantic Mapping for Persistent Orchard Monitoring: Method and Dataset
Monitoring orchards at the individual tree or fruit level throughout the growth season is crucial for plant phenotyping and horticultural resource optimization, such as chemical use and yield estimation. We present a 4D spatio-temporal metric-semantic mapping system that integrates multi-session measurements to track fruit growth over time. Our approach combines a LiDAR-RGB fusion module for 3D fruit localization with a 4D fruit association method leveraging positional, visual, and topology information for improved data association precision. Evaluated on real orchard data, our method achieves a 96.9% fruit counting accuracy for 1,790 apples across 60 trees, a mean fruit size estimation error of 1.1 cm, and a 23.7% improvement in 4D data association precision over baselines. We publicly release a multimodal dataset covering five fruit species across their growth seasons at https://4d-metric-semantic-mapping.org/
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Adapting autonomous agents to industrial, domestic, and other daily tasks is currently gaining momentum. However, in the global or cross-lingual application contexts, ensuring effective interaction with the environment and executing unrestricted human task-specified instructions in diverse languages remains an unsolved problem. To address this challenge, we propose ReLI, a language-agnostic framework designed to enable autonomous agents to converse naturally, semantically reason about the environment, and to perform downstream tasks, regardless of the task instruction's linguistic origin. First, we ground large-scale pre-trained foundation models and transform them into language-to-action models that can directly provide common-sense reasoning and high-level robot control through natural, free-flow human-robot conversational interactions. Further, we perform cross-lingual grounding of the models to ensure that ReLI generalises across the global languages. To demonstrate the ReLI's robustness, we conducted extensive simulated and real-world experiments on various short- and long-horizon tasks, including zero-shot and few-shot spatial navigation, scene information retrieval, and query-oriented tasks. We benchmarked the performance on 140 languages involving over 70K multi-turn conversations. On average, ReLI achieved over 90%$\pm$0.2 accuracy in cross-lingual instruction parsing and task execution success rates. These results demonstrate the ReLI's potential to enhance natural human-robot interaction in the real world while championing linguistic diversity. Demonstrations and resources will be publicly available at https://linusnep.github.io/ReLI/.
Robotic Visual Instruction
Recently, natural language has been the primary medium for human-robot interaction. However, its inherent lack of spatial precision introduces challenges for robotic task definition such as ambiguity and verbosity. Moreover, in some public settings where quiet is required, such as libraries or hospitals, verbal communication with robots is inappropriate. To address these limitations, we introduce the Robotic Visual Instruction (RoVI), a novel paradigm to guide robotic tasks through an object-centric, hand-drawn symbolic representation. RoVI effectively encodes spatial-temporal information into human-interpretable visual instructions through 2D sketches, utilizing arrows, circles, colors, and numbers to direct 3D robotic manipulation. To enable robots to understand RoVI better and generate precise actions based on RoVI, we present Visual Instruction Embodied Workflow (VIEW), a pipeline formulated for RoVI-conditioned policies. This approach leverages Vision-Language Models (VLMs) to interpret RoVI inputs, decode spatial and temporal constraints from 2D pixel space via keypoint extraction, and then transform them into executable 3D action sequences. We additionally curate a specialized dataset of 15K instances to fine-tune small VLMs for edge deployment,enabling them to effectively learn RoVI capabilities. Our approach is rigorously validated across 11 novel tasks in both real and simulated environments, demonstrating significant generalization capability. Notably, VIEW achieves an 87.5% success rate in real-world scenarios involving unseen tasks that feature multi-step actions, with disturbances, and trajectory-following requirements. Project website: https://robotic-visual-instruction.github.io/
comment: Project website: https://robotic-visual-instruction.github.io/
A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction
This paper introduces a novel framework integrating nonlinear acoustic computing and reinforcement learning to enhance advanced human-robot interaction under complex noise and reverberation. Leveraging physically informed wave equations (e.g., Westervelt, KZK), the approach captures higher-order phenomena such as harmonic generation and shock formation. By embedding these models in a reinforcement learning-driven control loop, the system adaptively optimizes key parameters (e.g., absorption, beamforming) to mitigate multipath interference and non-stationary noise. Experimental evaluations, covering far-field localization, weak signal detection, and multilingual speech recognition, demonstrate that this hybrid strategy surpasses traditional linear methods and purely data-driven baselines, achieving superior noise suppression, minimal latency, and robust accuracy in demanding real-world scenarios. The proposed system demonstrates broad application prospects in AI hardware, robot, machine audition, artificial audition, and brain-machine interfaces.
comment: 34 pages, 11 figures, 10 tables, and 10 equations
J-PARSE: Jacobian-based Projection Algorithm for Resolving Singularities Effectively in Inverse Kinematic Control of Serial Manipulators
J-PARSE is a method for smooth first-order inverse kinematic control of a serial manipulator near kinematic singularities. The commanded end-effector velocity is interpreted component-wise, according to the available mobility in each dimension of the task space. First, a substitute "Safety" Jacobian matrix is created, keeping the aspect ratio of the manipulability ellipsoid above a threshold value. The desired motion is then projected onto non-singular and singular directions, and the latter projection scaled down by a factor informed by the threshold value. A right-inverse of the non-singular Safety Jacobian is applied to the modified command. In the absence of joint limits and collisions, this ensures smooth transition into and out of low-rank poses, guaranteeing asymptotic stability for target poses within the workspace, and stability for those outside. Velocity control with J-PARSE is benchmarked against the Least-Squares and Damped Least-Squares inversions of the Jacobian, and shows high accuracy in reaching and leaving singular target poses. By expanding the available workspace of manipulators, the method finds applications in servoing, teleoperation, and learning.
comment: 18 pages, 25 figures. v1: Fig. 1 replaced with faster-loading version
DEFT: Differentiable Branched Discrete Elastic Rods for Modeling Furcated DLOs in Real-Time
Autonomous wire harness assembly requires robots to manipulate complex branched cables with high precision and reliability. A key challenge in automating this process is predicting how these flexible and branched structures behave under manipulation. Without accurate predictions, it is difficult for robots to reliably plan or execute assembly operations. While existing research has made progress in modeling single-threaded Deformable Linear Objects (DLOs), extending these approaches to Branched Deformable Linear Objects (BDLOs) presents fundamental challenges. The junction points in BDLOs create complex force interactions and strain propagation patterns that cannot be adequately captured by simply connecting multiple single-DLO models. To address these challenges, this paper presents Differentiable discrete branched Elastic rods for modeling Furcated DLOs in real-Time (DEFT), a novel framework that combines a differentiable physics-based model with a learning framework to: 1) accurately model BDLO dynamics, including dynamic propagation at junction points and grasping in the middle of a BDLO, 2) achieve efficient computation for real-time inference, and 3) enable planning to demonstrate dexterous BDLO manipulation. A comprehensive series of real-world experiments demonstrates DEFT's efficacy in terms of accuracy, computational speed, and generalizability compared to state-of-the-art alternatives. Project page:https://roahmlab.github.io/DEFT/.
High-order regularization dealing with ill-conditioned robot localization problems
In this work, we propose a high-order regularization method to solve the ill-conditioned problems in robot localization. Numerical solutions to robot localization problems are often unstable when the problems are ill-conditioned. A typical way to solve ill-conditioned problems is regularization, and a classical regularization method is the Tikhonov regularization. It is shown that the Tikhonov regularization is a low-order case of our method. We find that the proposed method is superior to the Tikhonov regularization in approximating some ill-conditioned inverse problems, such as some basic robot localization problems. The proposed method overcomes the over-smoothing problem in the Tikhonov regularization as it uses more than one term in the approximation of the matrix inverse, and an explanation for the over-smoothing of the Tikhonov regularization is given. Moreover, one a priori criterion, which improves the numerical stability of the ill-conditioned problem, is proposed to obtain an optimal regularization matrix. As most of the regularization solutions are biased, we also provide two bias-correction techniques for the proposed high-order regularization. The simulation and experimental results using an Ultra-Wideband sensor network in a 3D environment are discussed, demonstrating the performance of the proposed method.
comment: This paper has been accepted by IEEE Transactions on Robotics and the final version is available on IEEE Xplore
Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models
Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing approaches lack common-sense reasoning and are typically designed for single-robot settings, leading to reduced efficiency and robustness in complex environments. To address these limitations, we introduce Co-NavGPT, a novel framework that integrates a Vision Language Model (VLM) as a global planner to enable common-sense multi-robot visual target navigation. Co-NavGPT aggregates sub-maps from multiple robots with diverse viewpoints into a unified global map, encoding robot states and frontier regions. The VLM uses this information to assign frontiers across the robots, facilitating coordinated and efficient exploration. Experiments on the Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT outperforms existing baselines in terms of success rate and navigation efficiency, without requiring task-specific training. Ablation studies further confirm the importance of semantic priors from the VLM. We also validate the framework in real-world scenarios using quadrupedal robots. Supplementary video and code are available at: https://sites.google.com/view/co-navgpt2.
comment: 8 pages, 4 figures
Adversarial and Reactive Traffic Entities for Behavior-Realistic Driving Simulation: A Review
Despite advancements in perception and planning for autonomous vehicles (AVs), validating their performance remains a significant challenge. The deployment of planning algorithms in real-world environments is often ineffective due to discrepancies between simulations and real traffic conditions. Evaluating AVs planning algorithms in simulation typically involves replaying driving logs from recorded real-world traffic. However, entities replayed from offline data are not reactive, lack the ability to respond to arbitrary AV behavior, and cannot behave in an adversarial manner to test certain properties of the driving policy. Therefore, simulation with realistic and potentially adversarial entities represents a critical task for AV planning software validation. In this work, we aim to review current research efforts in the field of traffic simulation, focusing on the application of advanced techniques for modeling realistic and adversarial behaviors of traffic entities. The objective of this work is to categorize existing approaches based on the proposed classes of traffic entity behavior and scenario behavior control. Moreover, we collect traffic datasets and examine existing traffic simulations with respect to their employed default traffic entities. Finally, we identify challenges and open questions that hold potential for future research.
comment: Submitted to the IEEE for possible publication, 8 pages, 1 figures
Rule-Based Lloyd Algorithm for Multi-Robot Motion Planning and Control with Safety and Convergence Guarantees
This paper presents a distributed rule-based Lloyd algorithm (RBL) for multi-robot motion planning and control. The main limitations of the basic Loyd-based algorithm (LB) concern deadlock issues and the failure to address dynamic constraints effectively. Our contribution is twofold. First, we show how RBL is able to provide safety and convergence to the goal region without relying on communication between robots, nor synchronization between the robots. We considered different dynamic constraints with control inputs saturation. Second, we show that the Lloyd-based algorithm (without rules) can be successfully used as a safety layer for learning-based approaches, leading to non-negligible benefits. We further prove the soundness, reliability, and scalability of RBL through extensive simulations, comparisons with the state of the art, and experimental validations on small-scale car-like robots, unicycle-like robots, omnidirectional robots, and aerial robots on the field.
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Robotic manipulation faces critical challenges in understanding spatial affordances--the "where" and "how" of object interactions--essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that focus on dense spatial representations or trajectory modeling, we propose A0, a hierarchical affordance-aware diffusion model that decomposes manipulation tasks into high-level spatial affordance understanding and low-level action execution. A0 leverages the Embodiment-Agnostic Affordance Representation, which captures object-centric spatial affordances by predicting contact points and post-contact trajectories. A0 is pre-trained on 1 million contact points data and fine-tuned on annotated trajectories, enabling generalization across platforms. Key components include Position Offset Attention for motion-aware feature extraction and a Spatial Information Aggregation Layer for precise coordinate mapping. The model's output is executed by the action execution module. Experiments on multiple robotic systems (Franka, Kinova, Realman, and Dobot) demonstrate A0's superior performance in complex tasks, showcasing its efficiency, flexibility, and real-world applicability.
Variable-Speed Teaching-Playback as Real-World Data Augmentation for Imitation Learning
Because imitation learning relies on human demonstrations in hard-to-simulate settings, the inclusion of force control in this method has resulted in a shortage of training data, even with a simple change in speed. Although the field of data augmentation has addressed the lack of data, conventional methods of data augmentation for robot manipulation are limited to simulation-based methods or downsampling for position control. This paper proposes a novel method of data augmentation that is applicable to force control and preserves the advantages of real-world datasets. We applied teaching-playback at variable speeds as real-world data augmentation to increase both the quantity and quality of environmental reactions at variable speeds. An experiment was conducted on bilateral control-based imitation learning using a method of imitation learning equipped with position-force control. We evaluated the effect of real-world data augmentation on two tasks, pick-and-place and wiping, at variable speeds, each from two human demonstrations at fixed speed. The results showed a maximum 55% increase in success rate from a simple change in speed of real-world reactions and improved accuracy along the duration/frequency command by gathering environmental reactions at variable speeds.
comment: 16 pages, 12 figures, 4 tables. This is a preprint of an article whose final and definitive form has been published in ADVANCED ROBOTICS 2025, copyright Taylor & Francis and Robotics Society of Japan, is available online at: http://www.tandfonline.com/10.1080/01691864.2025.2497423; doi:10.1080/01691864.2025.2497423
Leveraging Computation of Expectation Models for Commonsense Affordance Estimation on 3D Scene Graphs IROS24
This article studies the commonsense object affordance concept for enabling close-to-human task planning and task optimization of embodied robotic agents in urban environments. The focus of the object affordance is on reasoning how to effectively identify object's inherent utility during the task execution, which in this work is enabled through the analysis of contextual relations of sparse information of 3D scene graphs. The proposed framework develops a Correlation Information (CECI) model to learn probability distributions using a Graph Convolutional Network, allowing to extract the commonsense affordance for individual members of a semantic class. The overall framework was experimentally validated in a real-world indoor environment, showcasing the ability of the method to level with human commonsense. For a video of the article, showcasing the experimental demonstration, please refer to the following link: https://youtu.be/BDCMVx2GiQE
comment: Accepted at IROS24
Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration AAAI 2025
Understanding how humans cooperatively utilize semantic knowledge to explore unfamiliar environments and decide on navigation directions is critical for house service multi-robot systems. Previous methods primarily focused on single-robot centralized planning strategies, which severely limited exploration efficiency. Recent research has considered decentralized planning strategies for multiple robots, assigning separate planning models to each robot, but these approaches often overlook communication costs. In this work, we propose Multimodal Chain-of-Thought Co-Navigation (MCoCoNav), a modular approach that utilizes multimodal Chain-of-Thought to plan collaborative semantic navigation for multiple robots. MCoCoNav combines visual perception with Vision Language Models (VLMs) to evaluate exploration value through probabilistic scoring, thus reducing time costs and achieving stable outputs. Additionally, a global semantic map is used as a communication bridge, minimizing communication overhead while integrating observational results. Guided by scores that reflect exploration trends, robots utilize this map to assess whether to explore new frontier points or revisit history nodes. Experiments on HM3D_v0.2 and MP3D demonstrate the effectiveness of our approach. Our code is available at https://github.com/FrankZxShen/MCoCoNav.git.
comment: 16 pages, 10 figures, Extended Version of accepted AAAI 2025 Paper
Visual-Based Forklift Learning System Enabling Zero-Shot Sim2Real Without Real-World Data ICRA
Forklifts are used extensively in various industrial settings and are in high demand for automation. In particular, counterbalance forklifts are highly versatile and employed in diverse scenarios. However, efforts to automate these processes are lacking, primarily owing to the absence of a safe and performance-verifiable development environment. This study proposes a learning system that combines a photorealistic digital learning environment with a 1/14-scale robotic forklift environment to address this challenge. Inspired by the training-based learning approach adopted by forklift operators, we employ an end-to-end vision-based deep reinforcement learning approach. The learning is conducted in a digitalized environment created from CAD data, making it safe and eliminating the need for real-world data. In addition, we safely validate the method in a physical setting utilizing a 1/14-scale robotic forklift with a configuration similar to that of a real forklift. We achieved a 60% success rate in pallet loading tasks in real experiments using a robotic forklift. Our approach demonstrates zero-shot sim2real with a simple method that does not require heuristic additions. This learning-based approach is considered a first step towards the automation of counterbalance forklifts.
comment: Accepted for publication in: 2025 International Conference on Robotics and Automation (ICRA)
Neural Configuration-Space Barriers for Manipulation Planning and Control
Planning and control for high-dimensional robot manipulators in cluttered, dynamic environments require both computational efficiency and robust safety guarantees. Inspired by recent advances in learning configuration-space distance functions (CDFs) as robot body representations, we propose a unified framework for motion planning and control that formulates safety constraints as CDF barriers. A CDF barrier approximates the local free configuration space, substantially reducing the number of collision-checking operations during motion planning. However, learning a CDF barrier with a neural network and relying on online sensor observations introduce uncertainties that must be considered during control synthesis. To address this, we develop a distributionally robust CDF barrier formulation for control that explicitly accounts for modeling errors and sensor noise without assuming a known underlying distribution. Simulations and hardware experiments on a 6-DoF xArm manipulator show that our neural CDF barrier formulation enables efficient planning and robust real-time safe control in cluttered and dynamic environments, relying only on onboard point-cloud observations.
Sensor-Based Distributionally Robust Control for Safe Robot Navigation in Dynamic Environments
We introduce a novel method for mobile robot navigation in dynamic, unknown environments, leveraging onboard sensing and distributionally robust optimization to impose probabilistic safety constraints. Our method introduces a distributionally robust control barrier function (DR-CBF) that directly integrates noisy sensor measurements and state estimates to define safety constraints. This approach is applicable to a wide range of control-affine dynamics, generalizable to robots with complex geometries, and capable of operating at real-time control frequencies. Coupled with a control Lyapunov function (CLF) for path following, the proposed CLF-DR-CBF control synthesis method achieves safe, robust, and efficient navigation in challenging environments. We demonstrate the effectiveness and robustness of our approach for safe autonomous navigation under uncertainty in simulations and real-world experiments with differential-drive robots.
comment: Project page: https://existentialrobotics.org/DRO_Safe_Navigation
Scratch Team of Single-Rotor Robots and Decentralized Cooperative Transportation with Robot Failure
Achieving cooperative transportation by aerial robot teams ensures flexibility regarding payloads and robustness against failures, which has garnered significant attention in recent years. This study proposes a flexible decentralized controller for robots and the shapes of payloads in a cooperative transport task using multiple single-rotor robots. The proposed controller is robust to mass and center of mass (COM) fluctuations and robot failures. Moreover, it possesses asymptotic stability against dynamics errors. Additionally, the controller supports heterogeneous single-rotor robots. Thus, robots with different specifications and deterioration may be effectively utilized for cooperative transportation. This performance is particularly effective for robot reuse. To achieve the aforementioned performance, the controller consists of a parallel structure comprising two controllers: a feedback controller, which renders the system strictly positive real, and a nonlinear controller, which renders the object asymptotic to the target. First, we confirm cooperative transportation using 8 and 10 robots for two shapes through numerical simulation. Subsequently, the cooperative transportation of a rectangle payload (with a weight of approximately 3 kg and maximum length of 1.6 m) is demonstrated using a robot team consisting of three types of robots, even under robot failure and fluctuation in the COM.
Cooperative Transportation using Multiple Single-Rotor Robots and Decentralized Control for Unknown Payloads ICRA
Cooperative transportation via multiple aerial robots has the potential to support various payloads and reduce the chances of them being dropped. Furthermore, autonomously controlled robots render the system scalable with respect to the payload. In this study, a cooperative transportation system was developed using rigidly attached single-rotor robots, and a decentralized controller was proposed to guarantee asymptotic stability of the error dynamics for unknown strictly positive real systems. A feedback controller was used to transform unstable systems into strictly positive real ones considering the shared attachment positions. First, the cooperative transportation of unknown payloads with different shapes larger than the carrier robots was investigated via numerical simulations. Second, cooperative transportation of an unknown payload (with a weight of approximately 2.7 kg and maximum length of 1.6 m) was demonstrated using eight robots, even under robot failure. Finally, the proposed system was shown to be capable of carrying an unknown payload, even if the attachment positions were not shared, that is, even if asymptotic stability was not strictly guaranteed.
comment: Accepted for publication in: 2022 International Conference on Robotics and Automation (ICRA), pp. 2024-2030, 2022. doi: 10.1109/ICRA46639.2022.9811768
Autonomous Cooperative Transportation System involving Multi-Aerial Robots with Variable Attachment Mechanism IROS
Cooperative transportation by multi-aerial robots has the potential to support various payloads and improve failsafe against dropping. Furthermore, changing the attachment positions of robots according payload characteristics increases the stability of transportation. However, there are almost no transportation systems capable of scaling to the payload weight and size and changing the optimal attachment positions. To address this issue, we propose a cooperative transportation system comprising autonomously executable software and suitable hardware, covering the entire process, from pre-takeoff setting to controlled flight. The proposed system decides the formation of the attachment positions by prioritizing controllability based on the center of gravity obtained from Bayesian estimations with robot pairs. We investigated the cooperative transportation of an unknown payload larger than that of whole carrier robots through numerical simulations. Furthermore, we performed cooperative transportation of an unknown payload (with a weight of about 3.2 kg and maximum length of 1.76 m) using eight robots. The proposed system was found to be versatile with regard to handling unknown payloads with different shapes and center-of-gravity positions.
comment: Accepted for publication in: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6322-6328, 2021. doi: 10.1109/IROS51168.2021.9636145
Is Your Imitation Learning Policy Better than Mine? Policy Comparison with Near-Optimal Stopping
Imitation learning has enabled robots to perform complex, long-horizon tasks in challenging dexterous manipulation settings. As new methods are developed, they must be rigorously evaluated and compared against corresponding baselines through repeated evaluation trials. However, policy comparison is fundamentally constrained by a small feasible sample size (e.g., 10 or 50) due to significant human effort and limited inference throughput of policies. This paper proposes a novel statistical framework for rigorously comparing two policies in the small sample size regime. Prior work in statistical policy comparison relies on batch testing, which requires a fixed, pre-determined number of trials and lacks flexibility in adapting the sample size to the observed evaluation data. Furthermore, extending the test with additional trials risks inducing inadvertent p-hacking, undermining statistical assurances. In contrast, our proposed statistical test is sequential, allowing researchers to decide whether or not to run more trials based on intermediate results. This adaptively tailors the number of trials to the difficulty of the underlying comparison, saving significant time and effort without sacrificing probabilistic correctness. Extensive numerical simulation and real-world robot manipulation experiments show that our test achieves near-optimal stopping, letting researchers stop evaluation and make a decision in a near-minimal number of trials. Specifically, it reduces the number of evaluation trials by up to 37% as compared to state-of-the-art baselines, while preserving the probabilistic correctness and statistical power of the comparison. Moreover, our method is strongest in the most challenging comparison instances (requiring the most evaluation trials); in a multi-task comparison scenario, we save the evaluator more than 200 simulation rollouts.
comment: 19 pages, 10 figures, 4 tables. Accepted to RSS 2025
To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted Feeding
Robot-assisted bite acquisition involves picking up food items with varying shapes, compliance, sizes, and textures. Fully autonomous strategies may not generalize efficiently across this diversity. We propose leveraging feedback from the care recipient when encountering novel food items. However, frequent queries impose a workload on the user. We formulate human-in-the-loop bite acquisition within a contextual bandit framework and introduce LinUCB-QG, a method that selectively asks for help using a predictive model of querying workload based on query types and timings. This model is trained on data collected in an online study involving 14 participants with mobility limitations, 3 occupational therapists simulating physical limitations, and 89 participants without limitations. We demonstrate that our method better balances task performance and querying workload compared to autonomous and always-querying baselines and adjusts its querying behavior to account for higher workload in users with mobility limitations. We validate this through experiments in a simulated food dataset and a user study with 19 participants, including one with severe mobility limitations. Please check out our project website at: http://emprise.cs.cornell.edu/hilbiteacquisition/
comment: The second and third authors contributed equally. The last two authors advised equally
LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World
Model-based reinforcement learning has attracted much attention due to its high sample efficiency and is expected to be applied to real-world robotic applications. In the real world, as unobservable disturbances can lead to unexpected situations, robot policies should be taken to improve not only control performance but also robustness. Adversarial learning is an effective way to improve robustness, but excessive adversary would increase the risk of malfunction, and make the control performance too conservative. Therefore, this study addresses a new adversarial learning framework to make reinforcement learning robust moderately and not conservative too much. To this end, the adversarial learning is first rederived with variational inference. In addition, \textit{light robustness}, which allows for maximizing robustness within an acceptable performance degradation, is utilized as a constraint. As a result, the proposed framework, so-called LiRA, can automatically adjust adversary level, balancing robustness and conservativeness. The expected behaviors of LiRA are confirmed in numerical simulations. In addition, LiRA succeeds in learning a force-reactive gait control of a quadrupedal robot only with real-world data collected less than two hours.
comment: 21 pages, 17 figures (accepted in Robotics and Autonomous Systems)
Context-aware LLM-based Safe Control Against Latent Risks
Autonomous control systems face significant challenges in performing complex tasks in the presence of latent risks. To address this, we propose an integrated framework that combines Large Language Models (LLMs), numerical optimization, and optimization-based control to facilitate efficient subtask learning while ensuring safety against latent risks. The framework decomposes complex tasks into a sequence of context-aware subtasks that account for latent risks. These subtasks and their parameters are then refined through a multi-time-scale process: high-layer multi-turn in-context learning, mid-layer LLM Chain-of-Thought reasoning and numerical optimization, and low-layer model predictive control. The framework iteratively improves decisions by leveraging qualitative feedback and optimized trajectory data from lower-layer optimization processes and a physics simulator. We validate the proposed framework through simulated case studies involving robot arm and autonomous vehicle scenarios. The experiments demonstrate that the proposed framework can mediate actions based on the context and latent risks and learn complex behaviors efficiently.
Online Controller Synthesis for Robot Collision Avoidance: A Case Study
The inherent uncertainty of dynamic environments poses significant challenges for modeling robot behavior, particularly in tasks such as collision avoidance. This paper presents an online controller synthesis framework tailored for robots equipped with deep learning-based perception components, with a focus on addressing distribution shifts. Our approach integrates periodic monitoring and repair mechanisms for the deep neural network perception component, followed by uncertainty reassessment. These uncertainty evaluations are injected into a parametric discrete-time markov chain, enabling the synthesis of robust controllers via probabilistic model checking. To ensure high system availability during the repair process, we propose a dual-component configuration that seamlessly transitions between operational states. Through a case study on robot collision avoidance, we demonstrate the efficacy of our method, showcasing substantial performance improvements over baseline approaches. This work provides a comprehensive and scalable solution for enhancing the safety and reliability of autonomous systems operating in uncertain environments.
comment: The collaborators disagree to publish on this platform
Human-Robot Interaction and Perceived Irrationality: A Study of Trust Dynamics and Error Acknowledgment
As robots become increasingly integrated into various industries, understanding how humans respond to robotic failures is critical. This study systematically examines trust dynamics and system design by analyzing human reactions to robot failures. We conducted a four-stage survey to explore how trust evolves throughout human-robot interactions. The first stage collected demographic data and initial trust levels. The second stage focused on preliminary expectations and perceptions of robotic capabilities. The third stage examined interaction details, including robot precision and error acknowledgment. Finally, the fourth stage assessed post-interaction perceptions, evaluating trust dynamics, forgiveness, and willingness to recommend robotic technologies. Results indicate that trust in robotic systems significantly increased when robots acknowledged their errors or limitations. Additionally, participants showed greater willingness to suggest robots for future tasks, highlighting the importance of direct engagement in shaping trust dynamics. These findings provide valuable insights for designing more transparent, responsive, and trustworthy robotic systems. By enhancing our understanding of human-robot interaction (HRI), this study contributes to the development of robotic technologies that foster greater public acceptance and adoption.
comment: 8 pages, 8 figures, 1 table, Ongoing Research
Characterizing Trust and Resilience in Distributed Consensus for Cyberphysical Systems
This work considers the problem of resilient consensus where stochastic values of trust between agents are available. Specifically, we derive a unified mathematical framework to characterize convergence, deviation of the consensus from the true consensus value, and expected convergence rate, when there exists additional information of trust between agents. We show that under certain conditions on the stochastic trust values and consensus protocol: 1) almost sure convergence to a common limit value is possible even when malicious agents constitute more than half of the network connectivity, 2) the deviation of the converged limit, from the case where there is no attack, i.e., the true consensus value, can be bounded with probability that approaches 1 exponentially, and 3) correct classification of malicious and legitimate agents can be attained in finite time almost surely. Further, the expected convergence rate decays exponentially as a function of the quality of the trust observations between agents.
Systems and Control (CS)
Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach
In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms for solving this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, linear convergence is established under additional assumptions. By employing specialized saddle-point subproblem solvers, iD2A and MiD2A attain significantly lower communication and computational complexities than existing algorithms across various scenarios. Finally, we conduct several numerical experiments to validate our theoretical results and to showcase the superior performance of iD2A and MiD2A in practice.
Toward a Harmonized Approach -- Requirement-based Structuring of a Safety Assurance Argumentation for Automated Vehicles
Despite increasing testing operation on public roads, media reports on incidents show that safety issues remain to this day. One major cause factoring into this circumstance is high development uncertainty that manufacturers face when deploying these systems in an open context. In particular, one challenge is establishing a valid argument at design time that the vehicle will exhibit reasonable residual risk when operating in its intended operational design domain. Regulations, such as the European Implementing Regulation 2022/1426, require manufacturers to provide a safety assurance argumentation for SAE-Level-4 automated vehicles. While there is extensive literature on assurance cases for safety-critical systems, the domain of automated driving lacks explicit requirements regarding the creation of safety assurance argumentations. In this paper, we aim to narrow this gap by elaborating a requirement-based approach. We derive structural requirements for an argumentation from literature and supplement these with requirements derived from stakeholder concerns. We implement the requirements, yielding a proposal for an overall argumentation structure. The resulting "safety arguments" argue over four topic complexes: The developed product, the underlying process including its conformance/compliance to standards/laws, as well as the argumentations' context and soundness. Finally, we instantiate this structure with respect to domain-specific needs and principles.
comment: submitted for publication
Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches
Motivated by recent advances of reinforcement learning and direct data-driven control, we propose policy gradient adaptive control (PGAC) for the linear quadratic regulator (LQR), which uses online closed-loop data to improve the control policy while maintaining stability. Our method adaptively updates the policy in feedback by descending the gradient of the LQR cost and is categorized as indirect, when gradients are computed via an estimated model, versus direct, when gradients are derived from data using sample covariance parameterization. Beyond the vanilla gradient, we also showcase the merits of the natural gradient and Gauss-Newton methods for the policy update. Notably, natural gradient descent bridges the indirect and direct PGAC, and the Gauss-Newton method of the indirect PGAC leads to an adaptive version of the celebrated Hewer's algorithm. To account for the uncertainty from noise, we propose a regularization method for both indirect and direct PGAC. For all the considered PGAC approaches, we show closed-loop stability and convergence of the policy to the optimal LQR gain. Simulations validate our theoretical findings and demonstrate the robustness and computational efficiency of PGAC.
Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving
Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be pre-computed offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning strategy for autonomous driving that ensures smooth and safe navigation around obstacles. Modeling the vehicles as safety-augmented bounding boxes and pedestrians as convex hulls in the Frenet space, our approach defines a drivable corridor by determining the appropriate deviation side for static obstacles. Thereafter, a modified space-domain bicycle kinematics model enables path optimization for smoothness, boundary clearance, and dynamic obstacle risk minimization. The optimized path is then passed to a speed planner to generate the final trajectory. We validate FCP through extensive simulations and real-world hardware experiments, demonstrating its efficiency and effectiveness.
comment: 8 pages, 10 figures - Presented at 2025 IEEE 36th Intelligent Vehicles Symposium (IV)
Optimal Droop Control Strategy for Coordinated Voltage Regulation and Power Sharing in Hybrid AC-MTDC Systems
With the growing integration of modular multilevel converters (MMCs) in Multi-Terminal Direct Current (MTDC) transmission systems, there is an increasing need for control strategies that ensure both economic efficiency and robust dynamic performance. This paper presents an enhanced Optimal Power Flow (OPF) framework for hybrid AC-MTDC systems, integrating a novel droop control strategy that coordinates DC voltage and AC frequency regulation. By embedding frequency control loops into the MMCs, the method enables system-wide coordination, enhancing power sharing and improving system resilience under disturbances. The proposed strategy dynamically adjusts converter operating points to minimize generation costs and DC voltage deviations, thus balancing economic objectives with system stability. A modified Nordic test system integrated with a four-terminal MTDC grid is used to validate the approach. Optimization is performed using Julia, while the system's dynamic performance is evaluated through electromagnetic transient simulations with the EMTP software. Case studies across multiple scenarios demonstrate that the proposed method consistently achieves lower generation costs than active power control and adaptive droop control strategy while maintaining stable control characteristics. The results highlight the method's capability to deliver cost-effective operation without compromising performance, offering a promising solution for the coordinated control of future hybrid AC-DC transmission networks.
BURNS: Backward Underapproximate Reachability for Neural-Feedback-Loop Systems
Learning-enabled planning and control algorithms are increasingly popular, but they often lack rigorous guarantees of performance or safety. We introduce an algorithm for computing underapproximate backward reachable sets of nonlinear discrete time neural feedback loops. We then use the backward reachable sets to check goal-reaching properties. Our algorithm is based on overapproximating the system dynamics function to enable computation of underapproximate backward reachable sets through solutions of mixed-integer linear programs. We rigorously analyze the soundness of our algorithm and demonstrate it on a numerical example. Our work expands the class of properties that can be verified for learning-enabled systems.
Backstepping Reach-avoid Controller Synthesis for Multi-input Multi-output Systems with Mixed Relative Degrees
Designing controllers with provable formal guarantees has become an urgent requirement for cyber-physical systems in safety-critical scenarios. Beyond addressing scalability in high-dimensional implementations, controller synthesis methodologies separating safety and reachability objectives may risk optimization infeasibility due to conflicting constraints, thereby significantly undermining their applicability in practical applications. In this paper, by leveraging feedback linearization and backstepping techniques, we present a novel framework for constructing provable reach-avoid formal certificates tailored to multi-input multi-output systems. Based on this, we developed a systematic synthesis approach for controllers with reach-avoid guarantees, which ensures that the outputs of the system eventually enter the predefined target set while staying within the required safe set. Finally, we demonstrate the effectiveness of our method through simulations.
comment: This work has been submitted to the IEEE for possible publication
Dynamic load balancing for cloud systems under heterogeneous setup delays
We consider a distributed cloud service deployed at a set of distinct server pools. Arriving jobs are classified into heterogeneous types, in accordance with their setup times which are differentiated at each of the pools. A dispatcher for each job type controls the balance of load between pools, based on decentralized feedback. The system of rates and queues is modeled by a fluid differential equation system, and analyzed via convex optimization. A first, myopic policy is proposed, based on task delay-to-service. Under a simplified dynamic fluid queue model, we prove global convergence to an equilibrium point which minimizes the mean setup time; however queueing delays are incurred with this method. A second proposal is then developed based on proximal optimization, which explicitly models the setup queue and is proved to reach an optimal equilibrium, devoid of queueing delay. Results are demonstrated through a simulation example.
comment: 20 pages, 1 figure
Artificial Potential Field and Sliding Mode Control for Spacecraft Attitude Maneuver with Actuation and Pointing Constraints
This study investigates the combination of guidance and control strategies for rigid spacecraft attitude reorientation, while dealing with forbidden pointing constraints, actuator limitations, and system uncertainties. These constraints arise due to the presence of bright objects in space that may damage sensitive payloads onboard the spacecraft, and the risk that actuator saturations may compromise closed-loop system stability. Furthermore, spacecraft attitude dynamics are typically affected by parametric uncertainties, external disturbances, and system nonlinearities, which cannot be neglected. In this article, the problem of spacecraft reorientation under pointing and actuation constraints is addressed using a strategy that combines Artificial Potential Field (APF) and Sliding Mode Control (SMC). A rigorous Lyapunov-based analysis yields closed-form expressions for APF/SMC gains, providing explicit mathematical formulas for gain values without the need for iterative computations. These expressions account for angular velocity and control torque limitations, external disturbances, and inertia uncertainties. The robustness of the proposed control strategy is demonstrated through Monte Carlo simulations using a high-fidelity attitude dynamics simulator. Additionally, mu-analysis is employed to assess local stability properties and quantify robustness margins. The results confirm the practical feasibility of the proposed method in real-world space scenarios, highlighting its effectiveness in uncertain and constrained environments.
Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions
Despite significant progress in autonomous navigation, a critical gap remains in ensuring reliable localization in hazardous environments such as tunnels, urban disaster zones, and underground structures. Tunnels present a uniquely difficult scenario: they are not only prone to GNSS signal loss, but also provide little features for visual localization due to their repetitive walls and poor lighting. These conditions degrade conventional vision-based and LiDAR-based systems, which rely on distinguishable environmental features. To address this, we propose a novel sensor fusion framework that integrates a thermal camera with a LiDAR to enable robust localization in tunnels and other perceptually degraded environments. The thermal camera provides resilience in low-light or smoke conditions, while the LiDAR delivers precise depth perception and structural awareness. By combining these sensors, our framework ensures continuous and accurate localization across diverse and dynamic environments. We use an Extended Kalman Filter (EKF) to fuse multi-sensor inputs, and leverages visual odometry and SLAM (Simultaneous Localization and Mapping) techniques to process the sensor data, enabling robust motion estimation and mapping even in GNSS-denied environments. This fusion of sensor modalities not only enhances system resilience but also provides a scalable solution for cyber-physical systems in connected and autonomous vehicles (CAVs). To validate the framework, we conduct tests in a tunnel environment, simulating sensor degradation and visibility challenges. The results demonstrate that our method sustains accurate localization where standard approaches deteriorate due to the tunnels featureless geometry. The frameworks versatility makes it a promising solution for autonomous vehicles, inspection robots, and other cyber-physical systems operating in constrained, perceptually poor environments.
comment: Submitted to IAVVC 2025
Event-Triggered GAT-LSTM Framework for Attack Detection in Heating, Ventilation, and Air Conditioning Systems
Heating, Ventilation, and Air Conditioning (HVAC) systems are essential for maintaining indoor environmental quality, but their interconnected nature and reliance on sensor networks make them vulnerable to cyber-physical attacks. Such attacks can interrupt system operations and risk leaking sensitive personal information through measurement data. In this paper, we propose a novel attack detection framework for HVAC systems, integrating an Event-Triggering Unit (ETU) for local monitoring and a cloud-based classification system using the Graph Attention Network (GAT) and the Long Short-Term Memory (LSTM) network. The ETU performs a binary classification to identify potential anomalies and selectively triggers encrypted data transmission to the cloud, significantly reducing communication cost. The cloud-side GAT module models the spatial relationships among HVAC components, while the LSTM module captures temporal dependencies across encrypted state sequences to classify the attack type. Our approach is evaluated on datasets that simulate diverse attack scenarios. Compared to GAT-only (94.2% accuracy) and LSTM-only (91.5%) ablations, our full GAT-LSTM model achieves 98.8% overall detection accuracy and reduces data transmission to 15%. These results demonstrate that the proposed framework achieves high detection accuracy while preserving data privacy by using the spatial-temporal characteristics of HVAC systems and minimizing transmission costs through event-triggered communication.
Sequentially learning regions of attraction from data
The paper is dedicated to data-driven analysis of dynamical systems. It deals with certifying the basin of attraction of a stable equilibrium for an unknown dynamical system. It is supposed that point-wise evaluation of the right-hand side of the ordinary differential equation governing the system is available for a set of points in the state space. Technically, a Piecewise Affine Lyapunov function will be constructed iteratively using an optimisation-based technique for the effective validation of the certificates. As a main contribution, whenever those certificates are violated locally, a refinement of the domain and the associated tessellation is produced, thus leading to an improvement in the description of the domain of attraction.
comment: IEEE MED2025 conference
Power Loss and Temperature Distribution in Coil of PFC Inductor with Air Gap for Multimode Operation
Power converters inherently display non-linear load characteristics, resulting in a high level of mains harmonics, and hence the necessity of implementing Power Factor Correction (PFC). Active PFC circuitry typically comprises an inductor and a power switch to control and alter the input current so that it matches, in shape and phase, the input voltage. This modelling of the waveforms can be performed by means of distinct conduction modes of the PFC inductor. The digital controller implemented in the constructed and investigated boost-type PFC converter can be programmed to operate in discontinuous conduction mode (DCM), continuous conduction mode (CCM), or a combination of the two. The individual modes of operation, via distinct PFC inductor current waveforms, impact the overall efficiency of power conversion and, by extension, temperature distribution in the magnetic component. This paper investigates how the examined conduction modes bear on distinct power-loss mechanisms present in the PFC inductor, including high-frequency eddy-current-generating phenomena, and the fringing effect in particular. As demonstrated herein, the DCM operation, for the set output power level, exhibits exacerbated power dissipation in the winding of the inductor due to the somewhat increased RSM value of the current and the intensified fringing magnetic flux at an air gap. The latter assertion will undergo further, more quantitatively focused research. Finally, the construction of the coil was optimised to reduce power loss by diminishing eddy-current mechanisms.
Learning-based Homothetic Tube MPC
In this paper, we study homothetic tube model predictive control (MPC) of discrete-time linear systems subject to bounded additive disturbance and mixed constraints on the state and input. Different from most existing work on robust MPC, we assume that the true disturbance set is unknown but a conservative surrogate is available a priori. Leveraging the real-time data, we develop an online learning algorithm to approximate the true disturbance set. This approximation and the corresponding constraints in the MPC optimisation are updated online using computationally convenient linear programs. We provide statistical gaps between the true and learned disturbance sets, based on which, probabilistic recursive feasibility of homothetic tube MPC problems is discussed. Numerical simulations are provided to demonstrate the efficacy of our proposed algorithm and compare with state-of-the-art MPC algorithms.
comment: Accepted for presentation at the 23rd European Control Conference
Multi-Class Stackelberg Games for the Co-Design of Networked Systems
We investigate a co-design problem, encompassing simultaneous design of system infrastructure and control, through a game-theoretical framework. To this end, we propose the co-design problem as a two-layer hierarchical strategic interaction. At the upper layer, a leader (or multiple leaders) determines system design parameters, while at the lower layer, a follower (or multiple followers) optimizes the control strategy. To capture this hierarchy, we propose four novel classes of Stackelberg games that integrate diverse strategic behaviors, including combinations of cooperative and non-cooperative interactions across two different layers. Notably, the leaders' interactions are represented using a normal-form game, whereas the followers' interactions are modeled by different games (dynamic games in discrete time). These distinct game structures result in a Stackelberg game that accommodates different game types per layer, and/or supports heterogeneous strategic behaviors involving cooperation and non-cooperation simultaneously. Learning algorithms using the best-response dynamics are used to solve the game problems when considering a discrete strategic space for the leaders. The efficacy of the proposed approach is demonstrated through an application to the co-design of the Barcelona drinking water network.
Enabling Robots to Autonomously Search Dynamic Cluttered Post-Disaster Environments
Robots will bring search and rescue (SaR) in disaster response to another level, in case they can autonomously take over dangerous SaR tasks from humans. A main challenge for autonomous SaR robots is to safely navigate in cluttered environments with uncertainties, while avoiding static and moving obstacles. We propose an integrated control framework for SaR robots in dynamic, uncertain environments, including a computationally efficient heuristic motion planning system that provides a nominal (assuming there are no uncertainties) collision-free trajectory for SaR robots and a robust motion tracking system that steers the robot to track this reference trajectory, taking into account the impact of uncertainties. The control architecture guarantees a balanced trade-off among various SaR objectives, while handling the hard constraints, including safety. The results of various computer-based simulations, presented in this paper, showed significant out-performance (of up to 42.3%) of the proposed integrated control architecture compared to two commonly used state-of-the-art methods (Rapidly-exploring Random Tree and Artificial Potential Function) in reaching targets (e.g., trapped victims in SaR) safely, collision-free, and in the shortest possible time.
Model Predictive Fuzzy Control: A Hierarchical Multi-Agent Control Architecture for Outdoor Search-and-Rescue Robots
Autonomous robots deployed in unknown search-and-rescue (SaR) environments can significantly improve the efficiency of the mission by assisting in fast localisation and rescue of the trapped victims. We propose a novel integrated hierarchical control architecture, called model predictive fuzzy control (MPFC), for autonomous mission planning of multi-robot SaR systems that should efficiently map an unknown environment: We combine model predictive control (MPC) and fuzzy logic control (FLC), where the robots are locally controlled by computationally efficient FLC controllers, and the parameters of these local controllers are tuned via a centralised MPC controller, in a regular or event-triggered manner. The proposed architecture provides three main advantages: (1) The control decisions are made by the FLC controllers, thus the real-time computation time is affordable. (2) The centralised MPC controller optimises the performance criteria with a global and predictive vision of the system dynamics, and updates the parameters of the FLC controllers accordingly. (3) FLC controllers are heuristic by nature and thus do not take into account optimality in their decisions, while the tuned parameters via the MPC controller can indirectly incorporate some level of optimality in local decisions of the robots. A simulation environment for victim detection in a disaster environment was designed in MATLAB using discrete, 2-D grid-based models. While being comparable from the point of computational efficiency, the integrated MPFC architecture improves the performance of the multi-robot SaR system compared to decentralised FLC controllers. Moreover, the performance of MPFC is comparable to the performance of centralised MPC for path planning of SaR robots, whereas MPFC requires significantly less computational resources, since the number of the optimisation variables in the control problem are reduced.
Non-linear dynamics of multibody systems: a system-based approach
This paper presents causal block-diagram models to represent the equations of motion of multi-body systems in a very compact and simple closed form. Both the forward dynamics (from the forces and torques imposed at the various degrees-of-freedom to the motions of these degrees-of-freedom) or the inverse dynamics (from the motions imposed at the degrees-of-freedom to the resulting forces and torques) can be considered and described by a block diagram model. This work extends the Two-Input Two-Output Port (TITOP) theory by including all non-linear terms and uniform or gravitational acceleration fields. Connection among different blocks is possible through the definition of the motion vector. The model of a system composed of a floating base, rigid bodies, revolute and prismatic joints, working under gravity is developed to illustrate the methodology. The proposed model is validated by simulation and cross-checking with a model built using an alternative modeling tool on a scenario where the nonlinear terms are determining.
comment: submitteds to the Journal of Multobody System Dynamics
Integrated Sensing, Computing, Communication, and Control for Time-Sequence-Based Semantic Communications
In the upcoming industrial internet of things (IIoT) era, a surge of task-oriented applications will rely on real-time wireless control systems (WCSs). For these systems, ultra-reliable and low-latency wireless communication will be crucial to ensure the timely transmission of control information. To achieve this purpose, we propose a novel time-sequence-based semantic communication paradigm, where an integrated sensing, computing, communication, and control (ISC3) architecture is developed to make sensible semantic inference (SI) for the control information over time sequences, enabling adaptive control of the robot. However, due to the causal correlations in the time sequence, the control information does not present the Markov property. To address this challenge, we compute the mutual information of the control information sensed at the transmitter (Tx) over different time and identify their temporal semantic correlation via a semantic feature extractor (SFE) module. By this means, highly correlated information transmission can be avoided, thus greatly reducing the communication overhead. Meanwhile, a semantic feature reconstructor (SFR) module is employed at the receiver (Rx) to reconstruct the control information based on the previously received one if the information transmission is not activated at the Tx. Furthermore, a control gain policy is also employed at the Rx to adaptively adjust the control gain for the controlled target based on several practical aspects such as the quality of the information transmission from the Tx to the Rx. We design the neural network structures of the above modules/policies and train their parameters by a novel hybrid reward multi-agent deep reinforcement learning framework. On-site experiments are conducted to evaluate the performance of our proposed method in practice, which shows significant gains over other baseline schemes.
comment: This version of the manuscript was submitted to IEEE Transactions on Communications for possible publication
The ISAC systems aided by MIMO, RIS and with Beamforming Techniques
This paper explores the integration of communication and sensing in modern wireless systems through the configuration of BS and RIS antenna elements. By leveraging time multiplexing for both communication and sensing, the proposed system optimizes spectral efficiency and operational performance. The use of static RIS configurations tailored to specific environments eliminates the need for dynamic reconfigurations, enhancing system agility, reducing processing complexity, and improving sensing accuracy. The system incorporates trilateration, angle of arrival, and time of arrival techniques to enable precise user localization by combining signals reflected along multiple paths. This method helps choose the best connections and lowers sensing costs while preventing interference with communication data, highlighting the need to bring together new technologies like passive and adaptive beamforming in one system.
comment: 32 pages, 7 figures, 2 tables
Global Task-aware Fault Detection, Identification For On-Orbit Multi-Spacecraft Collaborative Inspection
In this paper, we present a global-to-local task-aware fault detection and identification algorithm to detect failures in a multi-spacecraft system performing a collaborative inspection (referred to as global) task. The inspection task is encoded as a cost functional $\costH$ that informs global (task allocation and assignment) and local (agent-level) decision-making. The metric $\costH$ is a function of the inspection sensor model, and the agent full-pose. We use the cost functional $\costH$ to design a metric that compares the expected and actual performance to detect the faulty agent using a threshold. We use higher-order cost gradients $\costH$ to derive a new metric to identify the type of fault, including task-specific sensor fault, an agent-level actuator, and sensor faults. Furthermore, we propose an approach to design adaptive thresholds for each fault mentioned above to incorporate the time dependence of the inspection task. We demonstrate the efficacy of the proposed method empirically, by simulating and detecting faults (such as inspection sensor faults, actuators, and sensor faults) in a low-Earth orbit collaborative spacecraft inspection task using the metrics and the threshold designed using the global task cost $\costH$.
comment: published. 33rd AAS AIAA Conference 2023
Coevolution of Actions and Opinions in Networks of Coordinating and Anti-Coordinating Agents
In this paper, we investigate the dynamics of coordinating and anti-coordinating agents in a coevolutionary model for actions and opinions. In the model, the individuals of a population interact on a two-layer network, sharing their opinions and observing others' action, while revising their own opinions and actions according to a game-theoretic mechanism, grounded in the social psychology literature. First, we consider the scenario of coordinating agents, where convergence to a Nash equilibrium (NE) is guaranteed. We identify conditions for reaching consensus configurations and establish regions of attraction for these equilibria. Second, we study networks of anti-coordinating agents. In this second scenario, we prove that all trajectories converge to a NE by leveraging potential game theory. Then, we establish analytical conditions on the network structure and model parameters to guarantee the existence of consensus and polarized equilibria, characterizing their regions of attraction.
comment: Manuscript under review as a journal submission
Constraint Selection in Optimization-Based Controllers
Human-machine collaboration often involves constrained optimization problems for decision-making processes. However, when the machine is a dynamical system with a continuously evolving state, infeasibility due to multiple conflicting constraints can lead to dangerous outcomes. In this work, we propose a heuristic-based method that resolves infeasibility at every time step by selectively disregarding a subset of soft constraints based on the past values of the Lagrange multipliers. Compared to existing approaches, our method requires the solution of a smaller optimization problem to determine feasibility, resulting in significantly faster computation. Through a series of simulations, we demonstrate that our algorithm achieves performance comparable to state-of-the-art methods while offering improved computational efficiency.
comment: Submitted to IEEE Control Systems Letters (L-CSS)
An Overview of the Prospects and Challenges of Using Artificial Intelligence for Energy Management Systems in Microgrids
Microgrids have emerged as a pivotal solution in the quest for a sustainable and energy-efficient future. While microgrids offer numerous advantages, they are also prone to issues related to reliably forecasting renewable energy demand and production, protecting against cyberattacks, controlling operational costs, optimizing power flow, and regulating the performance of energy management systems (EMS). Tackling these energy management challenges is essential to facilitate microgrid applications and seamlessly incorporate renewable energy resources. Artificial intelligence (AI) has recently demonstrated immense potential for optimizing energy management in microgrids, providing efficient and reliable solutions. This paper highlights the combined benefits of enabling AI-based methodologies in the energy management systems of microgrids by examining the applicability and efficiency of AI-based EMS in achieving specific technical and economic objectives. The paper also points out several future research directions that promise to spearhead AI-driven EMS, namely the development of self-healing microgrids, integration with blockchain technology, use of Internet of things (IoT), and addressing interpretability, data privacy, scalability, and the prospects to generative AI in the context of future AI-based EMS.
comment: 70 pages, 7 figures
A Decision-Focused Predict-then-Bid Framework for Strategic Energy Storage
This paper introduces a novel decision-focused framework for energy storage arbitrage bidding. Inspired by the bidding process for energy storage in electricity markets, we propose a predict-then-bid end-to-end method incorporating the storage arbitrage optimization and market clearing models. This is achieved through a tri-layer framework that combines a price prediction layer with a two-stage optimization problem: an energy storage optimization layer and a market-clearing optimization layer. We leverage the implicit function theorem for gradient computation in the first optimization layer and incorporate a perturbation-based approach into the decision-focused loss function to ensure differentiability in the market-clearing layer. Numerical experiments using electricity market data from New York demonstrate that our bidding design substantially outperforms existing methods, achieving the highest profits and showcasing the effectiveness of the proposed approach.
TTT: A Temporal Refinement Heuristic for Tenuously Tractable Discrete Time Reachability Problems
Reachable set computation is an important tool for analyzing control systems. Simulating a control system can show general trends, but a formal tool like reachability analysis can provide guarantees of correctness. Reachability analysis for complex control systems, e.g., with nonlinear dynamics and/or a neural network controller, is often either slow or overly conservative. To address these challenges, much literature has focused on spatial refinement, i.e., tuning the discretization of the input sets and intermediate reachable sets. This paper introduces the idea of temporal refinement: automatically choosing when along the horizon of the reachability problem to execute slow symbolic queries which incur less approximation error versus fast concrete queries which incur more approximation error. Temporal refinement can be combined with other refinement approaches as an additional tool to trade off tractability and tightness in approximate reachable set computation. We introduce a temporal refinement algorithm and demonstrate its effectiveness at computing approximate reachable sets for nonlinear systems with neural network controllers. We calculate reachable sets with varying computational budget and show that our algorithm can generate approximate reachable sets with a similar amount of error to the baseline in 20-70% less time.
comment: To appear in the proceedings of the American Control Conference (ACC) 2025
The statistical spread of transmission outages on a fast protection time scale based on utility data
When there is a fault, the protection system automatically removes one or more transmission lines on a fast time scale of less than one minute. The outaged lines form a pattern in the transmission network. We extract these patterns from utility outage data, determine some key statistics of these patterns, and then show how to generate new patterns consistent with these statistics. The generated patterns provide a new and easily feasible way to model the overall effect of the protection system at the scale of a large transmission system. This new generative modeling of protection is expected to contribute to simulations of disturbances in large grids so that they can better quantify the risk of blackouts. Analysis of the pattern sizes suggests an index that describes how much outages spread in the transmission network at the fast timescale.
Quantifying the Aggregate Flexibility of Electric Vehicle Charging Stations for Market-based Congestion Management Services
Electric vehicles (EVs) play a crucial role in the transition towards sustainable modes of transportation and thus are critical to the energy transition. As their number grows, managing the aggregate power of EV charging is crucial to maintain grid stability and mitigate congestion. This study analyses more than 500 thousand real charging transactions in the Netherlands to explore the challenge and opportunity for the energy system presented by EV growth and smart charging flexibility. Specifically, it analyses the collective ability to provide congestion management services according to the specifications of those services in the Netherlands. In this study, a data-driven model of charging behaviour is created to explore the implications of delivering dependable congestion management services at various aggregation levels and types of service. The probabilistic ability to offer different flexibility products, namely, redispatch and capacity limitation, for congestion management, is assessed for different categories of charging stations (CS) and dispatch strategies. These probabilities can help EV aggregators, such as charging point operators, make informed decisions about offering congestion mitigation products per relevant regulations and distribution system operators to assess their potential. Further, it is shown how machine learning models can be incorporated to predict the day-ahead consumption, followed by operationally predicting redispatch flexibility. The findings demonstrate that the timing of EV arrivals, departures, and connections plays a crucial role in determining the feasibility of product offerings, and dependable services can generally be delivered using a sufficiently large number of CSs.
comment: 15 Pages,12 figures, 2 tables
AI-Aided Kalman Filters
The Kalman filter (KF) and its variants are among the most celebrated algorithms in signal processing. These methods are used for state estimation of dynamic systems by relying on mathematical representations in the form of simple state-space (SS) models, which may be crude and inaccurate descriptions of the underlying dynamics. Emerging data-centric artificial intelligence (AI) techniques tackle these tasks using deep neural networks (DNNs), which are model-agnostic. Recent developments illustrate the possibility of fusing DNNs with classic Kalman-type filtering, obtaining systems that learn to track in partially known dynamics. This article provides a tutorial-style overview of design approaches for incorporating AI in aiding KF-type algorithms. We review both generic and dedicated DNN architectures suitable for state estimation, and provide a systematic presentation of techniques for fusing AI tools with KFs and for leveraging partial SS modeling and data, categorizing design approaches into task-oriented and SS model-oriented. The usefulness of each approach in preserving the individual strengths of model-based KFs and data-driven DNNs is investigated in a qualitative and quantitative study, whose code is publicly available, illustrating the gains of hybrid model-based/data-driven designs. We also discuss existing challenges and future research directions that arise from fusing AI and Kalman-type algorithms.
comment: Submitted to the IEEE Signal Processing Magazine
Neural Configuration-Space Barriers for Manipulation Planning and Control
Planning and control for high-dimensional robot manipulators in cluttered, dynamic environments require both computational efficiency and robust safety guarantees. Inspired by recent advances in learning configuration-space distance functions (CDFs) as robot body representations, we propose a unified framework for motion planning and control that formulates safety constraints as CDF barriers. A CDF barrier approximates the local free configuration space, substantially reducing the number of collision-checking operations during motion planning. However, learning a CDF barrier with a neural network and relying on online sensor observations introduce uncertainties that must be considered during control synthesis. To address this, we develop a distributionally robust CDF barrier formulation for control that explicitly accounts for modeling errors and sensor noise without assuming a known underlying distribution. Simulations and hardware experiments on a 6-DoF xArm manipulator show that our neural CDF barrier formulation enables efficient planning and robust real-time safe control in cluttered and dynamic environments, relying only on onboard point-cloud observations.
Sensor-Based Distributionally Robust Control for Safe Robot Navigation in Dynamic Environments
We introduce a novel method for mobile robot navigation in dynamic, unknown environments, leveraging onboard sensing and distributionally robust optimization to impose probabilistic safety constraints. Our method introduces a distributionally robust control barrier function (DR-CBF) that directly integrates noisy sensor measurements and state estimates to define safety constraints. This approach is applicable to a wide range of control-affine dynamics, generalizable to robots with complex geometries, and capable of operating at real-time control frequencies. Coupled with a control Lyapunov function (CLF) for path following, the proposed CLF-DR-CBF control synthesis method achieves safe, robust, and efficient navigation in challenging environments. We demonstrate the effectiveness and robustness of our approach for safe autonomous navigation under uncertainty in simulations and real-world experiments with differential-drive robots.
comment: Project page: https://existentialrobotics.org/DRO_Safe_Navigation
Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
Transport systems on networks are crucial in various applications, but face a significant risk of being adversely affected by unforeseen circumstances such as disasters. The application of entropy-regularized optimal transport (OT) on graph structures has been investigated to enhance the robustness of transport on such networks. In this study, we propose an imitation-regularized OT (I-OT) that mathematically incorporates prior knowledge into the robustness of OT. This method is expected to enhance interpretability by integrating human insights into robustness and to accelerate practical applications. Furthermore, we mathematically verify the robustness of I-OT and discuss how these robustness properties relate to real-world applications. The effectiveness of this method is validated through a logistics simulation using automotive parts data.
comment: Accepted for publication in: IEEE Control Systems Letters, vol. 8, pp. 3470-3475, 2024, doi: 10.1109/LCSYS.2025.3546804
CalibRefine: Deep Learning-Based Online Automatic Targetless LiDAR-Camera Calibration with Iterative and Attention-Driven Post-Refinement
Accurate multi-sensor calibration is essential for deploying robust perception systems in applications such as autonomous driving and intelligent transportation. Existing LiDAR-camera calibration methods often rely on manually placed targets, preliminary parameter estimates, or intensive data preprocessing, limiting their scalability and adaptability in real-world settings. In this work, we propose a fully automatic, targetless, and online calibration framework, CalibRefine, which directly processes raw LiDAR point clouds and camera images. Our approach is divided into four stages: (1) a Common Feature Discriminator that leverages relative spatial positions, visual appearance embeddings, and semantic class cues to identify and generate reliable LiDAR-camera correspondences, (2) a coarse homography-based calibration that uses the matched feature correspondences to estimate an initial transformation between the LiDAR and camera frames, serving as the foundation for further refinement, (3) an iterative refinement to incrementally improve alignment as additional data frames become available, and (4) an attention-based refinement that addresses non-planar distortions by leveraging a Vision Transformer and cross-attention mechanisms. Extensive experiments on two urban traffic datasets demonstrate that CalibRefine achieves high-precision calibration with minimal human input, outperforming state-of-the-art targetless methods and matching or surpassing manually tuned baselines. Our results show that robust object-level feature matching, combined with iterative refinement and self-supervised attention-based refinement, enables reliable sensor alignment in complex real-world conditions without ground-truth matrices or elaborate preprocessing. Code is available at https://github.com/radar-lab/Lidar_Camera_Automatic_Calibration
Context-aware LLM-based Safe Control Against Latent Risks
Autonomous control systems face significant challenges in performing complex tasks in the presence of latent risks. To address this, we propose an integrated framework that combines Large Language Models (LLMs), numerical optimization, and optimization-based control to facilitate efficient subtask learning while ensuring safety against latent risks. The framework decomposes complex tasks into a sequence of context-aware subtasks that account for latent risks. These subtasks and their parameters are then refined through a multi-time-scale process: high-layer multi-turn in-context learning, mid-layer LLM Chain-of-Thought reasoning and numerical optimization, and low-layer model predictive control. The framework iteratively improves decisions by leveraging qualitative feedback and optimized trajectory data from lower-layer optimization processes and a physics simulator. We validate the proposed framework through simulated case studies involving robot arm and autonomous vehicle scenarios. The experiments demonstrate that the proposed framework can mediate actions based on the context and latent risks and learn complex behaviors efficiently.
Online Controller Synthesis for Robot Collision Avoidance: A Case Study
The inherent uncertainty of dynamic environments poses significant challenges for modeling robot behavior, particularly in tasks such as collision avoidance. This paper presents an online controller synthesis framework tailored for robots equipped with deep learning-based perception components, with a focus on addressing distribution shifts. Our approach integrates periodic monitoring and repair mechanisms for the deep neural network perception component, followed by uncertainty reassessment. These uncertainty evaluations are injected into a parametric discrete-time markov chain, enabling the synthesis of robust controllers via probabilistic model checking. To ensure high system availability during the repair process, we propose a dual-component configuration that seamlessly transitions between operational states. Through a case study on robot collision avoidance, we demonstrate the efficacy of our method, showcasing substantial performance improvements over baseline approaches. This work provides a comprehensive and scalable solution for enhancing the safety and reliability of autonomous systems operating in uncertain environments.
comment: The collaborators disagree to publish on this platform
Uncertainty-Aware Digital Twins: Robust Model Predictive Control using Time-Series Deep Quantile Learning
Digital Twins, virtual replicas of physical systems that enable real-time monitoring, model updates, predictions, and decision-making, present novel avenues for proactive control strategies for autonomous systems. However, achieving real-time decision-making in Digital Twins considering uncertainty necessitates an efficient uncertainty quantification (UQ) approach and optimization driven by accurate predictions of system behaviors, which remains a challenge for learning-based methods. This paper presents a simultaneous multi-step robust model predictive control (MPC) framework that incorporates real-time decision-making with uncertainty awareness for Digital Twin systems. Leveraging a multistep ahead predictor named Time-Series Dense Encoder (TiDE) as the surrogate model, this framework differs from conventional MPC models that provide only one-step ahead predictions. In contrast, TiDE can predict future states within the prediction horizon in a one-shot, significantly accelerating MPC. Furthermore, quantile regression is employed with the training of TiDE to perform flexible while computationally efficient UQ on data uncertainty. Consequently, with the deep learning quantiles, the robust MPC problem is formulated into a deterministic optimization problem and provides a safety buffer that accommodates disturbances to enhance constraint satisfaction rate. As a result, the proposed method outperforms existing robust MPC methods by providing less-conservative UQ and has demonstrated efficacy in an engineering case study involving Directed Energy Deposition (DED) additive manufacturing. This proactive while uncertainty-aware control capability positions the proposed method as a potent tool for future Digital Twin applications and real-time process control in engineering systems.
Characterizing Trust and Resilience in Distributed Consensus for Cyberphysical Systems
This work considers the problem of resilient consensus where stochastic values of trust between agents are available. Specifically, we derive a unified mathematical framework to characterize convergence, deviation of the consensus from the true consensus value, and expected convergence rate, when there exists additional information of trust between agents. We show that under certain conditions on the stochastic trust values and consensus protocol: 1) almost sure convergence to a common limit value is possible even when malicious agents constitute more than half of the network connectivity, 2) the deviation of the converged limit, from the case where there is no attack, i.e., the true consensus value, can be bounded with probability that approaches 1 exponentially, and 3) correct classification of malicious and legitimate agents can be attained in finite time almost surely. Further, the expected convergence rate decays exponentially as a function of the quality of the trust observations between agents.
Systems and Control (EESS)
Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach
In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms for solving this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, linear convergence is established under additional assumptions. By employing specialized saddle-point subproblem solvers, iD2A and MiD2A attain significantly lower communication and computational complexities than existing algorithms across various scenarios. Finally, we conduct several numerical experiments to validate our theoretical results and to showcase the superior performance of iD2A and MiD2A in practice.
Toward a Harmonized Approach -- Requirement-based Structuring of a Safety Assurance Argumentation for Automated Vehicles
Despite increasing testing operation on public roads, media reports on incidents show that safety issues remain to this day. One major cause factoring into this circumstance is high development uncertainty that manufacturers face when deploying these systems in an open context. In particular, one challenge is establishing a valid argument at design time that the vehicle will exhibit reasonable residual risk when operating in its intended operational design domain. Regulations, such as the European Implementing Regulation 2022/1426, require manufacturers to provide a safety assurance argumentation for SAE-Level-4 automated vehicles. While there is extensive literature on assurance cases for safety-critical systems, the domain of automated driving lacks explicit requirements regarding the creation of safety assurance argumentations. In this paper, we aim to narrow this gap by elaborating a requirement-based approach. We derive structural requirements for an argumentation from literature and supplement these with requirements derived from stakeholder concerns. We implement the requirements, yielding a proposal for an overall argumentation structure. The resulting "safety arguments" argue over four topic complexes: The developed product, the underlying process including its conformance/compliance to standards/laws, as well as the argumentations' context and soundness. Finally, we instantiate this structure with respect to domain-specific needs and principles.
comment: submitted for publication
Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches
Motivated by recent advances of reinforcement learning and direct data-driven control, we propose policy gradient adaptive control (PGAC) for the linear quadratic regulator (LQR), which uses online closed-loop data to improve the control policy while maintaining stability. Our method adaptively updates the policy in feedback by descending the gradient of the LQR cost and is categorized as indirect, when gradients are computed via an estimated model, versus direct, when gradients are derived from data using sample covariance parameterization. Beyond the vanilla gradient, we also showcase the merits of the natural gradient and Gauss-Newton methods for the policy update. Notably, natural gradient descent bridges the indirect and direct PGAC, and the Gauss-Newton method of the indirect PGAC leads to an adaptive version of the celebrated Hewer's algorithm. To account for the uncertainty from noise, we propose a regularization method for both indirect and direct PGAC. For all the considered PGAC approaches, we show closed-loop stability and convergence of the policy to the optimal LQR gain. Simulations validate our theoretical findings and demonstrate the robustness and computational efficiency of PGAC.
Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving
Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be pre-computed offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning strategy for autonomous driving that ensures smooth and safe navigation around obstacles. Modeling the vehicles as safety-augmented bounding boxes and pedestrians as convex hulls in the Frenet space, our approach defines a drivable corridor by determining the appropriate deviation side for static obstacles. Thereafter, a modified space-domain bicycle kinematics model enables path optimization for smoothness, boundary clearance, and dynamic obstacle risk minimization. The optimized path is then passed to a speed planner to generate the final trajectory. We validate FCP through extensive simulations and real-world hardware experiments, demonstrating its efficiency and effectiveness.
comment: 8 pages, 10 figures - Presented at 2025 IEEE 36th Intelligent Vehicles Symposium (IV)
Optimal Droop Control Strategy for Coordinated Voltage Regulation and Power Sharing in Hybrid AC-MTDC Systems
With the growing integration of modular multilevel converters (MMCs) in Multi-Terminal Direct Current (MTDC) transmission systems, there is an increasing need for control strategies that ensure both economic efficiency and robust dynamic performance. This paper presents an enhanced Optimal Power Flow (OPF) framework for hybrid AC-MTDC systems, integrating a novel droop control strategy that coordinates DC voltage and AC frequency regulation. By embedding frequency control loops into the MMCs, the method enables system-wide coordination, enhancing power sharing and improving system resilience under disturbances. The proposed strategy dynamically adjusts converter operating points to minimize generation costs and DC voltage deviations, thus balancing economic objectives with system stability. A modified Nordic test system integrated with a four-terminal MTDC grid is used to validate the approach. Optimization is performed using Julia, while the system's dynamic performance is evaluated through electromagnetic transient simulations with the EMTP software. Case studies across multiple scenarios demonstrate that the proposed method consistently achieves lower generation costs than active power control and adaptive droop control strategy while maintaining stable control characteristics. The results highlight the method's capability to deliver cost-effective operation without compromising performance, offering a promising solution for the coordinated control of future hybrid AC-DC transmission networks.
BURNS: Backward Underapproximate Reachability for Neural-Feedback-Loop Systems
Learning-enabled planning and control algorithms are increasingly popular, but they often lack rigorous guarantees of performance or safety. We introduce an algorithm for computing underapproximate backward reachable sets of nonlinear discrete time neural feedback loops. We then use the backward reachable sets to check goal-reaching properties. Our algorithm is based on overapproximating the system dynamics function to enable computation of underapproximate backward reachable sets through solutions of mixed-integer linear programs. We rigorously analyze the soundness of our algorithm and demonstrate it on a numerical example. Our work expands the class of properties that can be verified for learning-enabled systems.
Backstepping Reach-avoid Controller Synthesis for Multi-input Multi-output Systems with Mixed Relative Degrees
Designing controllers with provable formal guarantees has become an urgent requirement for cyber-physical systems in safety-critical scenarios. Beyond addressing scalability in high-dimensional implementations, controller synthesis methodologies separating safety and reachability objectives may risk optimization infeasibility due to conflicting constraints, thereby significantly undermining their applicability in practical applications. In this paper, by leveraging feedback linearization and backstepping techniques, we present a novel framework for constructing provable reach-avoid formal certificates tailored to multi-input multi-output systems. Based on this, we developed a systematic synthesis approach for controllers with reach-avoid guarantees, which ensures that the outputs of the system eventually enter the predefined target set while staying within the required safe set. Finally, we demonstrate the effectiveness of our method through simulations.
comment: This work has been submitted to the IEEE for possible publication
Dynamic load balancing for cloud systems under heterogeneous setup delays
We consider a distributed cloud service deployed at a set of distinct server pools. Arriving jobs are classified into heterogeneous types, in accordance with their setup times which are differentiated at each of the pools. A dispatcher for each job type controls the balance of load between pools, based on decentralized feedback. The system of rates and queues is modeled by a fluid differential equation system, and analyzed via convex optimization. A first, myopic policy is proposed, based on task delay-to-service. Under a simplified dynamic fluid queue model, we prove global convergence to an equilibrium point which minimizes the mean setup time; however queueing delays are incurred with this method. A second proposal is then developed based on proximal optimization, which explicitly models the setup queue and is proved to reach an optimal equilibrium, devoid of queueing delay. Results are demonstrated through a simulation example.
comment: 20 pages, 1 figure
Artificial Potential Field and Sliding Mode Control for Spacecraft Attitude Maneuver with Actuation and Pointing Constraints
This study investigates the combination of guidance and control strategies for rigid spacecraft attitude reorientation, while dealing with forbidden pointing constraints, actuator limitations, and system uncertainties. These constraints arise due to the presence of bright objects in space that may damage sensitive payloads onboard the spacecraft, and the risk that actuator saturations may compromise closed-loop system stability. Furthermore, spacecraft attitude dynamics are typically affected by parametric uncertainties, external disturbances, and system nonlinearities, which cannot be neglected. In this article, the problem of spacecraft reorientation under pointing and actuation constraints is addressed using a strategy that combines Artificial Potential Field (APF) and Sliding Mode Control (SMC). A rigorous Lyapunov-based analysis yields closed-form expressions for APF/SMC gains, providing explicit mathematical formulas for gain values without the need for iterative computations. These expressions account for angular velocity and control torque limitations, external disturbances, and inertia uncertainties. The robustness of the proposed control strategy is demonstrated through Monte Carlo simulations using a high-fidelity attitude dynamics simulator. Additionally, mu-analysis is employed to assess local stability properties and quantify robustness margins. The results confirm the practical feasibility of the proposed method in real-world space scenarios, highlighting its effectiveness in uncertain and constrained environments.
Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions
Despite significant progress in autonomous navigation, a critical gap remains in ensuring reliable localization in hazardous environments such as tunnels, urban disaster zones, and underground structures. Tunnels present a uniquely difficult scenario: they are not only prone to GNSS signal loss, but also provide little features for visual localization due to their repetitive walls and poor lighting. These conditions degrade conventional vision-based and LiDAR-based systems, which rely on distinguishable environmental features. To address this, we propose a novel sensor fusion framework that integrates a thermal camera with a LiDAR to enable robust localization in tunnels and other perceptually degraded environments. The thermal camera provides resilience in low-light or smoke conditions, while the LiDAR delivers precise depth perception and structural awareness. By combining these sensors, our framework ensures continuous and accurate localization across diverse and dynamic environments. We use an Extended Kalman Filter (EKF) to fuse multi-sensor inputs, and leverages visual odometry and SLAM (Simultaneous Localization and Mapping) techniques to process the sensor data, enabling robust motion estimation and mapping even in GNSS-denied environments. This fusion of sensor modalities not only enhances system resilience but also provides a scalable solution for cyber-physical systems in connected and autonomous vehicles (CAVs). To validate the framework, we conduct tests in a tunnel environment, simulating sensor degradation and visibility challenges. The results demonstrate that our method sustains accurate localization where standard approaches deteriorate due to the tunnels featureless geometry. The frameworks versatility makes it a promising solution for autonomous vehicles, inspection robots, and other cyber-physical systems operating in constrained, perceptually poor environments.
comment: Submitted to IAVVC 2025
Event-Triggered GAT-LSTM Framework for Attack Detection in Heating, Ventilation, and Air Conditioning Systems
Heating, Ventilation, and Air Conditioning (HVAC) systems are essential for maintaining indoor environmental quality, but their interconnected nature and reliance on sensor networks make them vulnerable to cyber-physical attacks. Such attacks can interrupt system operations and risk leaking sensitive personal information through measurement data. In this paper, we propose a novel attack detection framework for HVAC systems, integrating an Event-Triggering Unit (ETU) for local monitoring and a cloud-based classification system using the Graph Attention Network (GAT) and the Long Short-Term Memory (LSTM) network. The ETU performs a binary classification to identify potential anomalies and selectively triggers encrypted data transmission to the cloud, significantly reducing communication cost. The cloud-side GAT module models the spatial relationships among HVAC components, while the LSTM module captures temporal dependencies across encrypted state sequences to classify the attack type. Our approach is evaluated on datasets that simulate diverse attack scenarios. Compared to GAT-only (94.2% accuracy) and LSTM-only (91.5%) ablations, our full GAT-LSTM model achieves 98.8% overall detection accuracy and reduces data transmission to 15%. These results demonstrate that the proposed framework achieves high detection accuracy while preserving data privacy by using the spatial-temporal characteristics of HVAC systems and minimizing transmission costs through event-triggered communication.
Sequentially learning regions of attraction from data
The paper is dedicated to data-driven analysis of dynamical systems. It deals with certifying the basin of attraction of a stable equilibrium for an unknown dynamical system. It is supposed that point-wise evaluation of the right-hand side of the ordinary differential equation governing the system is available for a set of points in the state space. Technically, a Piecewise Affine Lyapunov function will be constructed iteratively using an optimisation-based technique for the effective validation of the certificates. As a main contribution, whenever those certificates are violated locally, a refinement of the domain and the associated tessellation is produced, thus leading to an improvement in the description of the domain of attraction.
comment: IEEE MED2025 conference
Power Loss and Temperature Distribution in Coil of PFC Inductor with Air Gap for Multimode Operation
Power converters inherently display non-linear load characteristics, resulting in a high level of mains harmonics, and hence the necessity of implementing Power Factor Correction (PFC). Active PFC circuitry typically comprises an inductor and a power switch to control and alter the input current so that it matches, in shape and phase, the input voltage. This modelling of the waveforms can be performed by means of distinct conduction modes of the PFC inductor. The digital controller implemented in the constructed and investigated boost-type PFC converter can be programmed to operate in discontinuous conduction mode (DCM), continuous conduction mode (CCM), or a combination of the two. The individual modes of operation, via distinct PFC inductor current waveforms, impact the overall efficiency of power conversion and, by extension, temperature distribution in the magnetic component. This paper investigates how the examined conduction modes bear on distinct power-loss mechanisms present in the PFC inductor, including high-frequency eddy-current-generating phenomena, and the fringing effect in particular. As demonstrated herein, the DCM operation, for the set output power level, exhibits exacerbated power dissipation in the winding of the inductor due to the somewhat increased RSM value of the current and the intensified fringing magnetic flux at an air gap. The latter assertion will undergo further, more quantitatively focused research. Finally, the construction of the coil was optimised to reduce power loss by diminishing eddy-current mechanisms.
Learning-based Homothetic Tube MPC
In this paper, we study homothetic tube model predictive control (MPC) of discrete-time linear systems subject to bounded additive disturbance and mixed constraints on the state and input. Different from most existing work on robust MPC, we assume that the true disturbance set is unknown but a conservative surrogate is available a priori. Leveraging the real-time data, we develop an online learning algorithm to approximate the true disturbance set. This approximation and the corresponding constraints in the MPC optimisation are updated online using computationally convenient linear programs. We provide statistical gaps between the true and learned disturbance sets, based on which, probabilistic recursive feasibility of homothetic tube MPC problems is discussed. Numerical simulations are provided to demonstrate the efficacy of our proposed algorithm and compare with state-of-the-art MPC algorithms.
comment: Accepted for presentation at the 23rd European Control Conference
Multi-Class Stackelberg Games for the Co-Design of Networked Systems
We investigate a co-design problem, encompassing simultaneous design of system infrastructure and control, through a game-theoretical framework. To this end, we propose the co-design problem as a two-layer hierarchical strategic interaction. At the upper layer, a leader (or multiple leaders) determines system design parameters, while at the lower layer, a follower (or multiple followers) optimizes the control strategy. To capture this hierarchy, we propose four novel classes of Stackelberg games that integrate diverse strategic behaviors, including combinations of cooperative and non-cooperative interactions across two different layers. Notably, the leaders' interactions are represented using a normal-form game, whereas the followers' interactions are modeled by different games (dynamic games in discrete time). These distinct game structures result in a Stackelberg game that accommodates different game types per layer, and/or supports heterogeneous strategic behaviors involving cooperation and non-cooperation simultaneously. Learning algorithms using the best-response dynamics are used to solve the game problems when considering a discrete strategic space for the leaders. The efficacy of the proposed approach is demonstrated through an application to the co-design of the Barcelona drinking water network.
Enabling Robots to Autonomously Search Dynamic Cluttered Post-Disaster Environments
Robots will bring search and rescue (SaR) in disaster response to another level, in case they can autonomously take over dangerous SaR tasks from humans. A main challenge for autonomous SaR robots is to safely navigate in cluttered environments with uncertainties, while avoiding static and moving obstacles. We propose an integrated control framework for SaR robots in dynamic, uncertain environments, including a computationally efficient heuristic motion planning system that provides a nominal (assuming there are no uncertainties) collision-free trajectory for SaR robots and a robust motion tracking system that steers the robot to track this reference trajectory, taking into account the impact of uncertainties. The control architecture guarantees a balanced trade-off among various SaR objectives, while handling the hard constraints, including safety. The results of various computer-based simulations, presented in this paper, showed significant out-performance (of up to 42.3%) of the proposed integrated control architecture compared to two commonly used state-of-the-art methods (Rapidly-exploring Random Tree and Artificial Potential Function) in reaching targets (e.g., trapped victims in SaR) safely, collision-free, and in the shortest possible time.
Model Predictive Fuzzy Control: A Hierarchical Multi-Agent Control Architecture for Outdoor Search-and-Rescue Robots
Autonomous robots deployed in unknown search-and-rescue (SaR) environments can significantly improve the efficiency of the mission by assisting in fast localisation and rescue of the trapped victims. We propose a novel integrated hierarchical control architecture, called model predictive fuzzy control (MPFC), for autonomous mission planning of multi-robot SaR systems that should efficiently map an unknown environment: We combine model predictive control (MPC) and fuzzy logic control (FLC), where the robots are locally controlled by computationally efficient FLC controllers, and the parameters of these local controllers are tuned via a centralised MPC controller, in a regular or event-triggered manner. The proposed architecture provides three main advantages: (1) The control decisions are made by the FLC controllers, thus the real-time computation time is affordable. (2) The centralised MPC controller optimises the performance criteria with a global and predictive vision of the system dynamics, and updates the parameters of the FLC controllers accordingly. (3) FLC controllers are heuristic by nature and thus do not take into account optimality in their decisions, while the tuned parameters via the MPC controller can indirectly incorporate some level of optimality in local decisions of the robots. A simulation environment for victim detection in a disaster environment was designed in MATLAB using discrete, 2-D grid-based models. While being comparable from the point of computational efficiency, the integrated MPFC architecture improves the performance of the multi-robot SaR system compared to decentralised FLC controllers. Moreover, the performance of MPFC is comparable to the performance of centralised MPC for path planning of SaR robots, whereas MPFC requires significantly less computational resources, since the number of the optimisation variables in the control problem are reduced.
Non-linear dynamics of multibody systems: a system-based approach
This paper presents causal block-diagram models to represent the equations of motion of multi-body systems in a very compact and simple closed form. Both the forward dynamics (from the forces and torques imposed at the various degrees-of-freedom to the motions of these degrees-of-freedom) or the inverse dynamics (from the motions imposed at the degrees-of-freedom to the resulting forces and torques) can be considered and described by a block diagram model. This work extends the Two-Input Two-Output Port (TITOP) theory by including all non-linear terms and uniform or gravitational acceleration fields. Connection among different blocks is possible through the definition of the motion vector. The model of a system composed of a floating base, rigid bodies, revolute and prismatic joints, working under gravity is developed to illustrate the methodology. The proposed model is validated by simulation and cross-checking with a model built using an alternative modeling tool on a scenario where the nonlinear terms are determining.
comment: submitteds to the Journal of Multobody System Dynamics
Integrated Sensing, Computing, Communication, and Control for Time-Sequence-Based Semantic Communications
In the upcoming industrial internet of things (IIoT) era, a surge of task-oriented applications will rely on real-time wireless control systems (WCSs). For these systems, ultra-reliable and low-latency wireless communication will be crucial to ensure the timely transmission of control information. To achieve this purpose, we propose a novel time-sequence-based semantic communication paradigm, where an integrated sensing, computing, communication, and control (ISC3) architecture is developed to make sensible semantic inference (SI) for the control information over time sequences, enabling adaptive control of the robot. However, due to the causal correlations in the time sequence, the control information does not present the Markov property. To address this challenge, we compute the mutual information of the control information sensed at the transmitter (Tx) over different time and identify their temporal semantic correlation via a semantic feature extractor (SFE) module. By this means, highly correlated information transmission can be avoided, thus greatly reducing the communication overhead. Meanwhile, a semantic feature reconstructor (SFR) module is employed at the receiver (Rx) to reconstruct the control information based on the previously received one if the information transmission is not activated at the Tx. Furthermore, a control gain policy is also employed at the Rx to adaptively adjust the control gain for the controlled target based on several practical aspects such as the quality of the information transmission from the Tx to the Rx. We design the neural network structures of the above modules/policies and train their parameters by a novel hybrid reward multi-agent deep reinforcement learning framework. On-site experiments are conducted to evaluate the performance of our proposed method in practice, which shows significant gains over other baseline schemes.
comment: This version of the manuscript was submitted to IEEE Transactions on Communications for possible publication
The ISAC systems aided by MIMO, RIS and with Beamforming Techniques
This paper explores the integration of communication and sensing in modern wireless systems through the configuration of BS and RIS antenna elements. By leveraging time multiplexing for both communication and sensing, the proposed system optimizes spectral efficiency and operational performance. The use of static RIS configurations tailored to specific environments eliminates the need for dynamic reconfigurations, enhancing system agility, reducing processing complexity, and improving sensing accuracy. The system incorporates trilateration, angle of arrival, and time of arrival techniques to enable precise user localization by combining signals reflected along multiple paths. This method helps choose the best connections and lowers sensing costs while preventing interference with communication data, highlighting the need to bring together new technologies like passive and adaptive beamforming in one system.
comment: 32 pages, 7 figures, 2 tables
Global Task-aware Fault Detection, Identification For On-Orbit Multi-Spacecraft Collaborative Inspection
In this paper, we present a global-to-local task-aware fault detection and identification algorithm to detect failures in a multi-spacecraft system performing a collaborative inspection (referred to as global) task. The inspection task is encoded as a cost functional $\costH$ that informs global (task allocation and assignment) and local (agent-level) decision-making. The metric $\costH$ is a function of the inspection sensor model, and the agent full-pose. We use the cost functional $\costH$ to design a metric that compares the expected and actual performance to detect the faulty agent using a threshold. We use higher-order cost gradients $\costH$ to derive a new metric to identify the type of fault, including task-specific sensor fault, an agent-level actuator, and sensor faults. Furthermore, we propose an approach to design adaptive thresholds for each fault mentioned above to incorporate the time dependence of the inspection task. We demonstrate the efficacy of the proposed method empirically, by simulating and detecting faults (such as inspection sensor faults, actuators, and sensor faults) in a low-Earth orbit collaborative spacecraft inspection task using the metrics and the threshold designed using the global task cost $\costH$.
comment: published. 33rd AAS AIAA Conference 2023
Coevolution of Actions and Opinions in Networks of Coordinating and Anti-Coordinating Agents
In this paper, we investigate the dynamics of coordinating and anti-coordinating agents in a coevolutionary model for actions and opinions. In the model, the individuals of a population interact on a two-layer network, sharing their opinions and observing others' action, while revising their own opinions and actions according to a game-theoretic mechanism, grounded in the social psychology literature. First, we consider the scenario of coordinating agents, where convergence to a Nash equilibrium (NE) is guaranteed. We identify conditions for reaching consensus configurations and establish regions of attraction for these equilibria. Second, we study networks of anti-coordinating agents. In this second scenario, we prove that all trajectories converge to a NE by leveraging potential game theory. Then, we establish analytical conditions on the network structure and model parameters to guarantee the existence of consensus and polarized equilibria, characterizing their regions of attraction.
comment: Manuscript under review as a journal submission
Constraint Selection in Optimization-Based Controllers
Human-machine collaboration often involves constrained optimization problems for decision-making processes. However, when the machine is a dynamical system with a continuously evolving state, infeasibility due to multiple conflicting constraints can lead to dangerous outcomes. In this work, we propose a heuristic-based method that resolves infeasibility at every time step by selectively disregarding a subset of soft constraints based on the past values of the Lagrange multipliers. Compared to existing approaches, our method requires the solution of a smaller optimization problem to determine feasibility, resulting in significantly faster computation. Through a series of simulations, we demonstrate that our algorithm achieves performance comparable to state-of-the-art methods while offering improved computational efficiency.
comment: Submitted to IEEE Control Systems Letters (L-CSS)
An Overview of the Prospects and Challenges of Using Artificial Intelligence for Energy Management Systems in Microgrids
Microgrids have emerged as a pivotal solution in the quest for a sustainable and energy-efficient future. While microgrids offer numerous advantages, they are also prone to issues related to reliably forecasting renewable energy demand and production, protecting against cyberattacks, controlling operational costs, optimizing power flow, and regulating the performance of energy management systems (EMS). Tackling these energy management challenges is essential to facilitate microgrid applications and seamlessly incorporate renewable energy resources. Artificial intelligence (AI) has recently demonstrated immense potential for optimizing energy management in microgrids, providing efficient and reliable solutions. This paper highlights the combined benefits of enabling AI-based methodologies in the energy management systems of microgrids by examining the applicability and efficiency of AI-based EMS in achieving specific technical and economic objectives. The paper also points out several future research directions that promise to spearhead AI-driven EMS, namely the development of self-healing microgrids, integration with blockchain technology, use of Internet of things (IoT), and addressing interpretability, data privacy, scalability, and the prospects to generative AI in the context of future AI-based EMS.
comment: 70 pages, 7 figures
A Decision-Focused Predict-then-Bid Framework for Strategic Energy Storage
This paper introduces a novel decision-focused framework for energy storage arbitrage bidding. Inspired by the bidding process for energy storage in electricity markets, we propose a predict-then-bid end-to-end method incorporating the storage arbitrage optimization and market clearing models. This is achieved through a tri-layer framework that combines a price prediction layer with a two-stage optimization problem: an energy storage optimization layer and a market-clearing optimization layer. We leverage the implicit function theorem for gradient computation in the first optimization layer and incorporate a perturbation-based approach into the decision-focused loss function to ensure differentiability in the market-clearing layer. Numerical experiments using electricity market data from New York demonstrate that our bidding design substantially outperforms existing methods, achieving the highest profits and showcasing the effectiveness of the proposed approach.
TTT: A Temporal Refinement Heuristic for Tenuously Tractable Discrete Time Reachability Problems
Reachable set computation is an important tool for analyzing control systems. Simulating a control system can show general trends, but a formal tool like reachability analysis can provide guarantees of correctness. Reachability analysis for complex control systems, e.g., with nonlinear dynamics and/or a neural network controller, is often either slow or overly conservative. To address these challenges, much literature has focused on spatial refinement, i.e., tuning the discretization of the input sets and intermediate reachable sets. This paper introduces the idea of temporal refinement: automatically choosing when along the horizon of the reachability problem to execute slow symbolic queries which incur less approximation error versus fast concrete queries which incur more approximation error. Temporal refinement can be combined with other refinement approaches as an additional tool to trade off tractability and tightness in approximate reachable set computation. We introduce a temporal refinement algorithm and demonstrate its effectiveness at computing approximate reachable sets for nonlinear systems with neural network controllers. We calculate reachable sets with varying computational budget and show that our algorithm can generate approximate reachable sets with a similar amount of error to the baseline in 20-70% less time.
comment: To appear in the proceedings of the American Control Conference (ACC) 2025
The statistical spread of transmission outages on a fast protection time scale based on utility data
When there is a fault, the protection system automatically removes one or more transmission lines on a fast time scale of less than one minute. The outaged lines form a pattern in the transmission network. We extract these patterns from utility outage data, determine some key statistics of these patterns, and then show how to generate new patterns consistent with these statistics. The generated patterns provide a new and easily feasible way to model the overall effect of the protection system at the scale of a large transmission system. This new generative modeling of protection is expected to contribute to simulations of disturbances in large grids so that they can better quantify the risk of blackouts. Analysis of the pattern sizes suggests an index that describes how much outages spread in the transmission network at the fast timescale.
Quantifying the Aggregate Flexibility of Electric Vehicle Charging Stations for Market-based Congestion Management Services
Electric vehicles (EVs) play a crucial role in the transition towards sustainable modes of transportation and thus are critical to the energy transition. As their number grows, managing the aggregate power of EV charging is crucial to maintain grid stability and mitigate congestion. This study analyses more than 500 thousand real charging transactions in the Netherlands to explore the challenge and opportunity for the energy system presented by EV growth and smart charging flexibility. Specifically, it analyses the collective ability to provide congestion management services according to the specifications of those services in the Netherlands. In this study, a data-driven model of charging behaviour is created to explore the implications of delivering dependable congestion management services at various aggregation levels and types of service. The probabilistic ability to offer different flexibility products, namely, redispatch and capacity limitation, for congestion management, is assessed for different categories of charging stations (CS) and dispatch strategies. These probabilities can help EV aggregators, such as charging point operators, make informed decisions about offering congestion mitigation products per relevant regulations and distribution system operators to assess their potential. Further, it is shown how machine learning models can be incorporated to predict the day-ahead consumption, followed by operationally predicting redispatch flexibility. The findings demonstrate that the timing of EV arrivals, departures, and connections plays a crucial role in determining the feasibility of product offerings, and dependable services can generally be delivered using a sufficiently large number of CSs.
comment: 15 Pages,12 figures, 2 tables
AI-Aided Kalman Filters
The Kalman filter (KF) and its variants are among the most celebrated algorithms in signal processing. These methods are used for state estimation of dynamic systems by relying on mathematical representations in the form of simple state-space (SS) models, which may be crude and inaccurate descriptions of the underlying dynamics. Emerging data-centric artificial intelligence (AI) techniques tackle these tasks using deep neural networks (DNNs), which are model-agnostic. Recent developments illustrate the possibility of fusing DNNs with classic Kalman-type filtering, obtaining systems that learn to track in partially known dynamics. This article provides a tutorial-style overview of design approaches for incorporating AI in aiding KF-type algorithms. We review both generic and dedicated DNN architectures suitable for state estimation, and provide a systematic presentation of techniques for fusing AI tools with KFs and for leveraging partial SS modeling and data, categorizing design approaches into task-oriented and SS model-oriented. The usefulness of each approach in preserving the individual strengths of model-based KFs and data-driven DNNs is investigated in a qualitative and quantitative study, whose code is publicly available, illustrating the gains of hybrid model-based/data-driven designs. We also discuss existing challenges and future research directions that arise from fusing AI and Kalman-type algorithms.
comment: Submitted to the IEEE Signal Processing Magazine
Neural Configuration-Space Barriers for Manipulation Planning and Control
Planning and control for high-dimensional robot manipulators in cluttered, dynamic environments require both computational efficiency and robust safety guarantees. Inspired by recent advances in learning configuration-space distance functions (CDFs) as robot body representations, we propose a unified framework for motion planning and control that formulates safety constraints as CDF barriers. A CDF barrier approximates the local free configuration space, substantially reducing the number of collision-checking operations during motion planning. However, learning a CDF barrier with a neural network and relying on online sensor observations introduce uncertainties that must be considered during control synthesis. To address this, we develop a distributionally robust CDF barrier formulation for control that explicitly accounts for modeling errors and sensor noise without assuming a known underlying distribution. Simulations and hardware experiments on a 6-DoF xArm manipulator show that our neural CDF barrier formulation enables efficient planning and robust real-time safe control in cluttered and dynamic environments, relying only on onboard point-cloud observations.
Sensor-Based Distributionally Robust Control for Safe Robot Navigation in Dynamic Environments
We introduce a novel method for mobile robot navigation in dynamic, unknown environments, leveraging onboard sensing and distributionally robust optimization to impose probabilistic safety constraints. Our method introduces a distributionally robust control barrier function (DR-CBF) that directly integrates noisy sensor measurements and state estimates to define safety constraints. This approach is applicable to a wide range of control-affine dynamics, generalizable to robots with complex geometries, and capable of operating at real-time control frequencies. Coupled with a control Lyapunov function (CLF) for path following, the proposed CLF-DR-CBF control synthesis method achieves safe, robust, and efficient navigation in challenging environments. We demonstrate the effectiveness and robustness of our approach for safe autonomous navigation under uncertainty in simulations and real-world experiments with differential-drive robots.
comment: Project page: https://existentialrobotics.org/DRO_Safe_Navigation
Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
Transport systems on networks are crucial in various applications, but face a significant risk of being adversely affected by unforeseen circumstances such as disasters. The application of entropy-regularized optimal transport (OT) on graph structures has been investigated to enhance the robustness of transport on such networks. In this study, we propose an imitation-regularized OT (I-OT) that mathematically incorporates prior knowledge into the robustness of OT. This method is expected to enhance interpretability by integrating human insights into robustness and to accelerate practical applications. Furthermore, we mathematically verify the robustness of I-OT and discuss how these robustness properties relate to real-world applications. The effectiveness of this method is validated through a logistics simulation using automotive parts data.
comment: Accepted for publication in: IEEE Control Systems Letters, vol. 8, pp. 3470-3475, 2024, doi: 10.1109/LCSYS.2025.3546804
CalibRefine: Deep Learning-Based Online Automatic Targetless LiDAR-Camera Calibration with Iterative and Attention-Driven Post-Refinement
Accurate multi-sensor calibration is essential for deploying robust perception systems in applications such as autonomous driving and intelligent transportation. Existing LiDAR-camera calibration methods often rely on manually placed targets, preliminary parameter estimates, or intensive data preprocessing, limiting their scalability and adaptability in real-world settings. In this work, we propose a fully automatic, targetless, and online calibration framework, CalibRefine, which directly processes raw LiDAR point clouds and camera images. Our approach is divided into four stages: (1) a Common Feature Discriminator that leverages relative spatial positions, visual appearance embeddings, and semantic class cues to identify and generate reliable LiDAR-camera correspondences, (2) a coarse homography-based calibration that uses the matched feature correspondences to estimate an initial transformation between the LiDAR and camera frames, serving as the foundation for further refinement, (3) an iterative refinement to incrementally improve alignment as additional data frames become available, and (4) an attention-based refinement that addresses non-planar distortions by leveraging a Vision Transformer and cross-attention mechanisms. Extensive experiments on two urban traffic datasets demonstrate that CalibRefine achieves high-precision calibration with minimal human input, outperforming state-of-the-art targetless methods and matching or surpassing manually tuned baselines. Our results show that robust object-level feature matching, combined with iterative refinement and self-supervised attention-based refinement, enables reliable sensor alignment in complex real-world conditions without ground-truth matrices or elaborate preprocessing. Code is available at https://github.com/radar-lab/Lidar_Camera_Automatic_Calibration
Context-aware LLM-based Safe Control Against Latent Risks
Autonomous control systems face significant challenges in performing complex tasks in the presence of latent risks. To address this, we propose an integrated framework that combines Large Language Models (LLMs), numerical optimization, and optimization-based control to facilitate efficient subtask learning while ensuring safety against latent risks. The framework decomposes complex tasks into a sequence of context-aware subtasks that account for latent risks. These subtasks and their parameters are then refined through a multi-time-scale process: high-layer multi-turn in-context learning, mid-layer LLM Chain-of-Thought reasoning and numerical optimization, and low-layer model predictive control. The framework iteratively improves decisions by leveraging qualitative feedback and optimized trajectory data from lower-layer optimization processes and a physics simulator. We validate the proposed framework through simulated case studies involving robot arm and autonomous vehicle scenarios. The experiments demonstrate that the proposed framework can mediate actions based on the context and latent risks and learn complex behaviors efficiently.
Online Controller Synthesis for Robot Collision Avoidance: A Case Study
The inherent uncertainty of dynamic environments poses significant challenges for modeling robot behavior, particularly in tasks such as collision avoidance. This paper presents an online controller synthesis framework tailored for robots equipped with deep learning-based perception components, with a focus on addressing distribution shifts. Our approach integrates periodic monitoring and repair mechanisms for the deep neural network perception component, followed by uncertainty reassessment. These uncertainty evaluations are injected into a parametric discrete-time markov chain, enabling the synthesis of robust controllers via probabilistic model checking. To ensure high system availability during the repair process, we propose a dual-component configuration that seamlessly transitions between operational states. Through a case study on robot collision avoidance, we demonstrate the efficacy of our method, showcasing substantial performance improvements over baseline approaches. This work provides a comprehensive and scalable solution for enhancing the safety and reliability of autonomous systems operating in uncertain environments.
comment: The collaborators disagree to publish on this platform
Uncertainty-Aware Digital Twins: Robust Model Predictive Control using Time-Series Deep Quantile Learning
Digital Twins, virtual replicas of physical systems that enable real-time monitoring, model updates, predictions, and decision-making, present novel avenues for proactive control strategies for autonomous systems. However, achieving real-time decision-making in Digital Twins considering uncertainty necessitates an efficient uncertainty quantification (UQ) approach and optimization driven by accurate predictions of system behaviors, which remains a challenge for learning-based methods. This paper presents a simultaneous multi-step robust model predictive control (MPC) framework that incorporates real-time decision-making with uncertainty awareness for Digital Twin systems. Leveraging a multistep ahead predictor named Time-Series Dense Encoder (TiDE) as the surrogate model, this framework differs from conventional MPC models that provide only one-step ahead predictions. In contrast, TiDE can predict future states within the prediction horizon in a one-shot, significantly accelerating MPC. Furthermore, quantile regression is employed with the training of TiDE to perform flexible while computationally efficient UQ on data uncertainty. Consequently, with the deep learning quantiles, the robust MPC problem is formulated into a deterministic optimization problem and provides a safety buffer that accommodates disturbances to enhance constraint satisfaction rate. As a result, the proposed method outperforms existing robust MPC methods by providing less-conservative UQ and has demonstrated efficacy in an engineering case study involving Directed Energy Deposition (DED) additive manufacturing. This proactive while uncertainty-aware control capability positions the proposed method as a potent tool for future Digital Twin applications and real-time process control in engineering systems.
Characterizing Trust and Resilience in Distributed Consensus for Cyberphysical Systems
This work considers the problem of resilient consensus where stochastic values of trust between agents are available. Specifically, we derive a unified mathematical framework to characterize convergence, deviation of the consensus from the true consensus value, and expected convergence rate, when there exists additional information of trust between agents. We show that under certain conditions on the stochastic trust values and consensus protocol: 1) almost sure convergence to a common limit value is possible even when malicious agents constitute more than half of the network connectivity, 2) the deviation of the converged limit, from the case where there is no attack, i.e., the true consensus value, can be bounded with probability that approaches 1 exponentially, and 3) correct classification of malicious and legitimate agents can be attained in finite time almost surely. Further, the expected convergence rate decays exponentially as a function of the quality of the trust observations between agents.
Multiagent Systems
A Communication-First Account of Explanation
This paper develops a formal account of causal explanation, grounded in a theory of conversational pragmatics, and inspired by the interventionist idea that explanation is about asking and answering what-if-things-had-been-different questions. We illustrate the fruitfulness of the account, relative to previous accounts, by showing that widely recognised explanatory virtues emerge naturally, as do subtle empirical patterns concerning the impact of norms on causal judgments. This shows the value of a communication-first approach to explanation: getting clear on explanation's communicative dimension is an important prerequisite for philosophical work on explanation. The result is a simple but powerful framework for incorporating insights from the cognitive sciences into philosophical work on explanation, which will be useful for philosophers or cognitive scientists interested in explanation.
Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning
Solar sensor-based monitoring systems have become a crucial agricultural innovation, advancing farm management and animal welfare through integrating sensor technology, Internet-of-Things, and edge and cloud computing. However, the resilience of these systems to cyber-attacks and their adaptability to dynamic and constrained energy supplies remain largely unexplored. To address these challenges, we propose a sustainable smart farm network designed to maintain high-quality animal monitoring under various cyber and adversarial threats, as well as fluctuating energy conditions. Our approach utilizes deep reinforcement learning (DRL) to devise optimal policies that maximize both monitoring effectiveness and energy efficiency. To overcome DRL's inherent challenge of slow convergence, we integrate transfer learning (TL) and decision theory (DT) to accelerate the learning process. By incorporating DT-guided strategies, we optimize monitoring quality and energy sustainability, significantly reducing training time while achieving comparable performance rewards. Our experimental results prove that DT-guided DRL outperforms TL-enhanced DRL models, improving system performance and reducing training runtime by 47.5%.
Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation
In real-world multi-agent systems (MASs), observation delays are ubiquitous, preventing agents from making decisions based on the environment's true state. An individual agent's local observation often consists of multiple components from other agents or dynamic entities in the environment. These discrete observation components with varying delay characteristics pose significant challenges for multi-agent reinforcement learning (MARL). In this paper, we first formulate the decentralized stochastic individual delay partially observable Markov decision process (DSID-POMDP) by extending the standard Dec-POMDP. We then propose the Rainbow Delay Compensation (RDC), a MARL training framework for addressing stochastic individual delays, along with recommended implementations for its constituent modules. We implement the DSID-POMDP's observation generation pattern using standard MARL benchmarks, including MPE and SMAC. Experiments demonstrate that baseline MARL methods suffer severe performance degradation under fixed and unfixed delays. The RDC-enhanced approach mitigates this issue, remarkably achieving ideal delay-free performance in certain delay scenarios while maintaining generalization capability. Our work provides a novel perspective on multi-agent delayed observation problems and offers an effective solution framework.
comment: The code will be open-sourced in the RDC-pymarl project under https://github.com/linkjoker1006
Simulation to Reality: Testbeds and Architectures for Connected and Automated Vehicles
Ensuring the safe and efficient operation of CAVs relies heavily on the software framework used. A software framework needs to ensure real-time properties, reliable communication, and efficient resource utilization. Furthermore, a software framework needs to enable seamless transition between testing stages, from simulation to small-scale to full-scale experiments. In this paper, we survey prominent software frameworks used for in-vehicle and inter-vehicle communication in CAVs. We analyze these frameworks regarding opportunities and challenges, such as their real-time properties and transitioning capabilities. Additionally, we delve into the tooling requirements necessary for addressing the associated challenges. We illustrate the practical implications of these challenges through case studies focusing on critical areas such as perception, motion planning, and control. Furthermore, we identify research gaps in the field, highlighting areas where further investigation is needed to advance the development and deployment of safe and efficient CAV systems.
Multi-Agent Deep Reinforcement Learning for Zonal Ancillary Market Coupling
We characterize zonal ancillary market coupling relying on noncooperative game theory. To that purpose, we formulate the ancillary market as a multi-leader single follower bilevel problem, that we subsequently cast as a generalized Nash game with side constraints and nonconvex feasibility sets. We determine conditions for equilibrium existence and show that the game has a generalized potential game structure. To compute market equilibrium, we rely on two exact approaches: an integrated optimization approach and Gauss-Seidel best-response, that we compare against multi-agent deep reinforcement learning. On real data from Germany and Austria, simulations indicate that multi-agent deep reinforcement learning achieves the smallest convergence rate but requires pretraining, while best-response is the slowest. On the economics side, multi-agent deep reinforcement learning results in smaller market costs compared to the exact methods, but at the cost of higher variability in the profit allocation among stakeholders. Further, stronger coupling between zones tends to reduce costs for larger zones.
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering
This study explores the application of chaos engineering to enhance the robustness of Large Language Model-Based Multi-Agent Systems (LLM-MAS) in production-like environments under real-world conditions. LLM-MAS can potentially improve a wide range of tasks, from answering questions and generating content to automating customer support and improving decision-making processes. However, LLM-MAS in production or preproduction environments can be vulnerable to emergent errors or disruptions, such as hallucinations, agent failures, and agent communication failures. This study proposes a chaos engineering framework to proactively identify such vulnerabilities in LLM-MAS, assess and build resilience against them, and ensure reliable performance in critical applications.
Global Task-aware Fault Detection, Identification For On-Orbit Multi-Spacecraft Collaborative Inspection
In this paper, we present a global-to-local task-aware fault detection and identification algorithm to detect failures in a multi-spacecraft system performing a collaborative inspection (referred to as global) task. The inspection task is encoded as a cost functional $\costH$ that informs global (task allocation and assignment) and local (agent-level) decision-making. The metric $\costH$ is a function of the inspection sensor model, and the agent full-pose. We use the cost functional $\costH$ to design a metric that compares the expected and actual performance to detect the faulty agent using a threshold. We use higher-order cost gradients $\costH$ to derive a new metric to identify the type of fault, including task-specific sensor fault, an agent-level actuator, and sensor faults. Furthermore, we propose an approach to design adaptive thresholds for each fault mentioned above to incorporate the time dependence of the inspection task. We demonstrate the efficacy of the proposed method empirically, by simulating and detecting faults (such as inspection sensor faults, actuators, and sensor faults) in a low-Earth orbit collaborative spacecraft inspection task using the metrics and the threshold designed using the global task cost $\costH$.
comment: published. 33rd AAS AIAA Conference 2023
The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete
According to Yuval Noah Harari, large-scale human cooperation is driven by shared narratives that encode common beliefs and values. This study explores whether such narratives can similarly nudge LLM agents toward collaboration. We use a finitely repeated public goods game in which LLM agents choose either cooperative or egoistic spending strategies. We prime agents with stories highlighting teamwork to different degrees and test how this influences negotiation outcomes. Our experiments explore four questions:(1) How do narratives influence negotiation behavior? (2) What differs when agents share the same story versus different ones? (3) What happens when the agent numbers grow? (4) Are agents resilient against self-serving negotiators? We find that story-based priming significantly affects negotiation strategies and success rates. Common stories improve collaboration, benefiting each agent. By contrast, priming agents with different stories reverses this effect, and those agents primed toward self-interest prevail. We hypothesize that these results carry implications for multi-agent system design and AI alignment.
comment: 16 pages, 8 figures. Code available at https://github.com/storyagents25/story-agents
From Glue-Code to Protocols: A Critical Analysis of A2A and MCP Integration for Scalable Agent Systems
Artificial intelligence is rapidly evolving towards multi-agent systems where numerous AI agents collaborate and interact with external tools. Two key open standards, Google's Agent to Agent (A2A) protocol for inter-agent communication and Anthropic's Model Context Protocol (MCP) for standardized tool access, promise to overcome the limitations of fragmented, custom integration approaches. While their potential synergy is significant, this paper argues that effectively integrating A2A and MCP presents unique, emergent challenges at their intersection, particularly concerning semantic interoperability between agent tasks and tool capabilities, the compounded security risks arising from combined discovery and execution, and the practical governance required for the envisioned "Agent Economy". This work provides a critical analysis, moving beyond a survey to evaluate the practical implications and inherent difficulties of combining these horizontal and vertical integration standards. We examine the benefits (e.g., specialization, scalability) while critically assessing their dependencies and trade-offs in an integrated context. We identify key challenges increased by the integration, including novel security vulnerabilities, privacy complexities, debugging difficulties across protocols, and the need for robust semantic negotiation mechanisms. In summary, A2A+MCP offers a vital architectural foundation, but fully realizing its potential requires substantial advancements to manage the complexities of their combined operation.
BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling ICML 2025
Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce ``Text-Controlled TSG'', a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integrates semantic prototypes with text description for supporting domain-level guidance. This approach achieves state-of-the-art generation fidelity on 11 of 12 datasets, and improves controllability by 12.52% on MSE and 6.34% MAE compared to no text input generation, highlighting its potential for generating tailored time-series data.
comment: ICML 2025
Towards Enterprise-Ready Computer Using Generalist Agent
This paper presents our ongoing work toward developing an enterprise-ready Computer Using Generalist Agent (CUGA) system. Our research highlights the evolutionary nature of building agentic systems suitable for enterprise environments. By integrating state-of-the-art agentic AI techniques with a systematic approach to iterative evaluation, analysis, and refinement, we have achieved rapid and cost-effective performance gains, notably reaching a new state-of-the-art performance on the WebArena benchmark. We detail our development roadmap, the methodology and tools that facilitated rapid learning from failures and continuous system refinement, and discuss key lessons learned and future challenges for enterprise adoption.
Cooperative Multi-Agent Planning with Adaptive Skill Synthesis
Despite much progress in training distributed artificial intelligence (AI), building cooperative multi-agent systems with multi-agent reinforcement learning (MARL) faces challenges in sample efficiency, interpretability, and transferability. Unlike traditional learning-based methods that require extensive interaction with the environment, large language models (LLMs) demonstrate remarkable capabilities in zero-shot planning and complex reasoning. However, existing LLM-based approaches heavily rely on text-based observations and struggle with the non-Markovian nature of multi-agent interactions under partial observability. We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making. The skill library, bootstrapped from demonstrations, evolves via planner-guided tasks to enable adaptive strategies. COMPASS propagates entity information through multi-hop communication under partial observability. Evaluations on the improved StarCraft Multi-Agent Challenge (SMACv2) demonstrate COMPASS's strong performance against state-of-the-art MARL baselines across both symmetric and asymmetric scenarios. Notably, in the symmetric Protoss 5v5 task, COMPASS achieved a 57\% win rate, representing a 30 percentage point advantage over QMIX (27\%). Project page can be found at https://stellar-entremet-1720bb.netlify.app/.
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
We introduce LiteWebAgent, an open-source suite for VLM-based web agent applications. Our framework addresses a critical gap in the web agent ecosystem with a production-ready solution that combines minimal serverless backend configuration, intuitive user and browser interfaces, and extensible research capabilities in agent planning, memory, and tree search. For the core LiteWebAgent agent framework, we implemented a simple yet effective baseline using recursive function calling, providing with decoupled action generation and action grounding. In addition, we integrate advanced research components such as agent planning, agent workflow memory, and tree search in a modular and extensible manner. We then integrate the LiteWebAgent agent framework with frontend and backend as deployed systems in two formats: (1) a production Vercel-based web application, which provides users with an agent-controlled remote browser, (2) a Chrome extension leveraging LiteWebAgent's API to control an existing Chrome browser via CDP (Chrome DevTools Protocol). The LiteWebAgent framework is available at https://github.com/PathOnAI/LiteWebAgent, with deployed frontend at https://lite-web-agent.vercel.app/.
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
In this paper we present ToMCAT (Theory-of-Mind for Cooperative Agents in Teams), a new framework for generating ToM-conditioned trajectories. It combines a meta-learning mechanism, that performs ToM reasoning over teammates' underlying goals and future behavior, with a multiagent denoising-diffusion model, that generates plans for an agent and its teammates conditioned on both the agent's goals and its teammates' characteristics, as computed via ToM. We implemented an online planning system that dynamically samples new trajectories (replans) from the diffusion model whenever it detects a divergence between a previously generated plan and the current state of the world. We conducted several experiments using ToMCAT in a simulated cooking domain. Our results highlight the importance of the dynamic replanning mechanism in reducing the usage of resources without sacrificing team performance. We also show that recent observations about the world and teammates' behavior collected by an agent over the course of an episode combined with ToM inferences are crucial to generate team-aware plans for dynamic adaptation to teammates, especially when no prior information is provided about them.
comment: Appears in Proc. of the Adaptive and Learning Agents Workshop (ALA 2025), ala-workshop.github.io
Towards a HIPAA Compliant Agentic AI System in Healthcare
Agentic AI systems powered by Large Language Models (LLMs) as their foundational reasoning engine, are transforming clinical workflows such as medical report generation and clinical summarization by autonomously analyzing sensitive healthcare data and executing decisions with minimal human oversight. However, their adoption demands strict compliance with regulatory frameworks such as Health Insurance Portability and Accountability Act (HIPAA), particularly when handling Protected Health Information (PHI). This work-in-progress paper introduces a HIPAA-compliant Agentic AI framework that enforces regulatory compliance through dynamic, context-aware policy enforcement. Our framework integrates three core mechanisms: (1) Attribute-Based Access Control (ABAC) for granular PHI governance, (2) a hybrid PHI sanitization pipeline combining regex patterns and BERT-based model to minimize leakage, and (3) immutable audit trails for compliance verification.
Robotics
TWIST: Teleoperated Whole-Body Imitation System
Teleoperating humanoid robots in a whole-body manner marks a fundamental step toward developing general-purpose robotic intelligence, with human motion providing an ideal interface for controlling all degrees of freedom. Yet, most current humanoid teleoperation systems fall short of enabling coordinated whole-body behavior, typically limiting themselves to isolated locomotion or manipulation tasks. We present the Teleoperated Whole-Body Imitation System (TWIST), a system for humanoid teleoperation through whole-body motion imitation. We first generate reference motion clips by retargeting human motion capture data to the humanoid robot. We then develop a robust, adaptive, and responsive whole-body controller using a combination of reinforcement learning and behavior cloning (RL+BC). Through systematic analysis, we demonstrate how incorporating privileged future motion frames and real-world motion capture (MoCap) data improves tracking accuracy. TWIST enables real-world humanoid robots to achieve unprecedented, versatile, and coordinated whole-body motor skills--spanning whole-body manipulation, legged manipulation, locomotion, and expressive movement--using a single unified neural network controller. Our project website: https://humanoid-teleop.github.io
comment: Project website: https://humanoid-teleop.github.io
Giving Simulated Cells a Voice: Evolving Prompt-to-Intervention Models for Cellular Control GECCO
Guiding biological systems toward desired states, such as morphogenetic outcomes, remains a fundamental challenge with far-reaching implications for medicine and synthetic biology. While large language models (LLMs) have enabled natural language as an interface for interpretable control in AI systems, their use as mediators for steering biological or cellular dynamics remains largely unexplored. In this work, we present a functional pipeline that translates natural language prompts into spatial vector fields capable of directing simulated cellular collectives. Our approach combines a large language model with an evolvable neural controller (Prompt-to-Intervention, or P2I), optimized via evolutionary strategies to generate behaviors such as clustering or scattering in a simulated 2D environment. We demonstrate that even with constrained vocabulary and simplified cell models, evolved P2I networks can successfully align cellular dynamics with user-defined goals expressed in plain language. This work offers a complete loop from language input to simulated bioelectric-like intervention to behavioral output, providing a foundation for future systems capable of natural language-driven cellular control.
comment: Accepted to GECCO Workshop on Bio-Inspired AI (ACM GECCO2025). 13 pages, 7 figures
Re-purposing a modular origami manipulator into an adaptive physical computer for machine learning and robotic perception
Physical computing has emerged as a powerful tool for performing intelligent tasks directly in the mechanical domain of functional materials and robots, reducing our reliance on the more traditional COMS computers. However, no systematic study explains how mechanical design can influence physical computing performance. This study sheds insights into this question by repurposing an origami-inspired modular robotic manipulator into an adaptive physical reservoir and systematically evaluating its computing capacity with different physical configurations, input setups, and computing tasks. By challenging this adaptive reservoir computer to complete the classical NARMA benchmark tasks, this study shows that its time series emulation performance directly correlates to the Peak Similarity Index (PSI), which quantifies the frequency spectrum correlation between the target output and reservoir dynamics. The adaptive reservoir also demonstrates perception capabilities, accurately extracting its payload weight and orientation information from the intrinsic dynamics. Importantly, such information extraction capability can be measured by the spatial correlation between nodal dynamics within the reservoir body. Finally, by integrating shape memory alloy (SMA) actuation, this study demonstrates how to exploit such computing power embodied in the physical body for practical, robotic operations. This study provides a strategic framework for harvesting computing power from soft robots and functional materials, demonstrating how design parameters and input selection can be configured based on computing task requirements. Extending this framework to bio-inspired adaptive materials, prosthetics, and self-adaptive soft robotic systems could enable next-generation embodied intelligence, where the physical structure can compute and interact with their digital counterparts.
Grasp the Graph (GtG) 2.0: Ensemble of GNNs for High-Precision Grasp Pose Detection in Clutter
Grasp pose detection in cluttered, real-world environments remains a significant challenge due to noisy and incomplete sensory data combined with complex object geometries. This paper introduces Grasp the Graph 2.0 (GtG 2.0) method, a lightweight yet highly effective hypothesis-and-test robotics grasping framework which leverages an ensemble of Graph Neural Networks for efficient geometric reasoning from point cloud data. Building on the success of GtG 1.0, which demonstrated the potential of Graph Neural Networks for grasp detection but was limited by assumptions of complete, noise-free point clouds and 4-Dof grasping, GtG 2.0 employs a conventional Grasp Pose Generator to efficiently produce 7-Dof grasp candidates. Candidates are assessed with an ensemble Graph Neural Network model which includes points within the gripper jaws (inside points) and surrounding contextual points (outside points). This improved representation boosts grasp detection performance over previous methods using the same generator. GtG 2.0 shows up to a 35% improvement in Average Precision on the GraspNet-1Billion benchmark compared to hypothesis-and-test and Graph Neural Network-based methods, ranking it among the top three frameworks. Experiments with a 3-Dof Delta Parallel robot and Kinect-v1 camera show a success rate of 91% and a clutter completion rate of 100%, demonstrating its flexibility and reliability.
comment: 9 Pages, 6 figures
LiDAR-Inertial SLAM-Based Navigation and Safety-Oriented AI-Driven Control System for Skid-Steer Robots
Integrating artificial intelligence (AI) and stochastic technologies into the mobile robot navigation and control (MRNC) framework while adhering to rigorous safety standards presents significant challenges. To address these challenges, this paper proposes a comprehensively integrated MRNC framework for skid-steer wheeled mobile robots (SSWMRs), in which all components are actively engaged in real-time execution. The framework comprises: 1) a LiDAR-inertial simultaneous localization and mapping (SLAM) algorithm for estimating the current pose of the robot within the built map; 2) an effective path-following control system for generating desired linear and angular velocity commands based on the current pose and the desired pose; 3) inverse kinematics for transferring linear and angular velocity commands into left and right side velocity commands; and 4) a robust AI-driven (RAID) control system incorporating a radial basis function network (RBFN) with a new adaptive algorithm to enforce in-wheel actuation systems to track each side motion commands. To further meet safety requirements, the proposed RAID control within the MRNC framework of the SSWMR constrains AI-generated tracking performance within predefined overshoot and steady-state error limits, while ensuring robustness and system stability by compensating for modeling errors, unknown RBF weights, and external forces. Experimental results verify the proposed MRNC framework performance for a 4,836 kg SSWMR operating on soft terrain.
comment: This paper has been submitted in the IEEE CDC 2025 for potential presentation
Learning and Online Replication of Grasp Forces from Electromyography Signals for Prosthetic Finger Control ICRA 2025
Partial hand amputations significantly affect the physical and psychosocial well-being of individuals, yet intuitive control of externally powered prostheses remains an open challenge. To address this gap, we developed a force-controlled prosthetic finger activated by electromyography (EMG) signals. The prototype, constructed around a wrist brace, functions as a supernumerary finger placed near the index, allowing for early-stage evaluation on unimpaired subjects. A neural network-based model was then implemented to estimate fingertip forces from EMG inputs, allowing for online adjustment of the prosthetic finger grip strength. The force estimation model was validated through experiments with ten participants, demonstrating its effectiveness in predicting forces. Additionally, online trials with four users wearing the prosthesis exhibited precise control over the device. Our findings highlight the potential of using EMG-based force estimation to enhance the functionality of prosthetic fingers.
comment: 7 pages, 6 figures, to be presented at ICRA 2025
HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction
This paper introduces HapticVLM, a novel multimodal system that integrates vision-language reasoning with deep convolutional networks to enable real-time haptic feedback. HapticVLM leverages a ConvNeXt-based material recognition module to generate robust visual embeddings for accurate identification of object materials, while a state-of-the-art Vision-Language Model (Qwen2-VL-2B-Instruct) infers ambient temperature from environmental cues. The system synthesizes tactile sensations by delivering vibrotactile feedback through speakers and thermal cues via a Peltier module, thereby bridging the gap between visual perception and tactile experience. Experimental evaluations demonstrate an average recognition accuracy of 84.67% across five distinct auditory-tactile patterns and a temperature estimation accuracy of 86.7% based on a tolerance-based evaluation method with an 8{\deg}C margin of error across 15 scenarios. Although promising, the current study is limited by the use of a small set of prominent patterns and a modest participant pool. Future work will focus on expanding the range of tactile patterns and increasing user studies to further refine and validate the system's performance. Overall, HapticVLM presents a significant step toward context-aware, multimodal haptic interaction with potential applications in virtual reality, and assistive technologies.
comment: Submitted to IEEE conf
Data-Driven Energy Modeling of Industrial IoT Systems: A Benchmarking Approach
The widespread adoption of IoT has driven the development of cyber-physical systems (CPS) in industrial environments, leveraging Industrial IoTs (IIoTs) to automate manufacturing processes and enhance productivity. The transition to autonomous systems introduces significant operational costs, particularly in terms of energy consumption. Accurate modeling and prediction of IIoT energy requirements are critical, but traditional physics- and engineering-based approaches often fall short in addressing these challenges comprehensively. In this paper, we propose a novel methodology for benchmarking and analyzing IIoT devices and applications to uncover insights into their power demands, energy consumption, and performance. To demonstrate this methodology, we develop a comprehensive framework and apply it to study an industrial CPS comprising an educational robotic arm, a conveyor belt, a smart camera, and a compute node. By creating micro-benchmarks and an end-to-end application within this framework, we create an extensive performance and power consumption dataset, which we use to train and analyze ML models for predicting energy usage from features of the application and the CPS system. The proposed methodology and framework provide valuable insights into the energy dynamics of industrial CPS, offering practical implications for researchers and practitioners aiming to enhance the efficiency and sustainability of IIoT-driven automation.
Corr2Distrib: Making Ambiguous Correspondences an Ally to Predict Reliable 6D Pose Distributions
We introduce Corr2Distrib, the first correspondence-based method which estimates a 6D camera pose distribution from an RGB image, explaining the observations. Indeed, symmetries and occlusions introduce visual ambiguities, leading to multiple valid poses. While a few recent methods tackle this problem, they do not rely on local correspondences which, according to the BOP Challenge, are currently the most effective way to estimate a single 6DoF pose solution. Using correspondences to estimate a pose distribution is not straightforward, since ambiguous correspondences induced by visual ambiguities drastically decrease the performance of PnP. With Corr2Distrib, we turn these ambiguities into an advantage to recover all valid poses. Corr2Distrib first learns a symmetry-aware representation for each 3D point on the object's surface, characterized by a descriptor and a local frame. This representation enables the generation of 3DoF rotation hypotheses from single 2D-3D correspondences. Next, we refine these hypotheses into a 6DoF pose distribution using PnP and pose scoring. Our experimental evaluations on complex non-synthetic scenes show that Corr2Distrib outperforms state-of-the-art solutions for both pose distribution estimation and single pose estimation from an RGB image, demonstrating the potential of correspondences-based approaches.
comment: 8 pages, 5 figures
Automated Hybrid Reward Scheduling via Large Language Models for Robotic Skill Learning
Enabling a high-degree-of-freedom robot to learn specific skills is a challenging task due to the complexity of robotic dynamics. Reinforcement learning (RL) has emerged as a promising solution; however, addressing such problems requires the design of multiple reward functions to account for various constraints in robotic motion. Existing approaches typically sum all reward components indiscriminately to optimize the RL value function and policy. We argue that this uniform inclusion of all reward components in policy optimization is inefficient and limits the robot's learning performance. To address this, we propose an Automated Hybrid Reward Scheduling (AHRS) framework based on Large Language Models (LLMs). This paradigm dynamically adjusts the learning intensity of each reward component throughout the policy optimization process, enabling robots to acquire skills in a gradual and structured manner. Specifically, we design a multi-branch value network, where each branch corresponds to a distinct reward component. During policy optimization, each branch is assigned a weight that reflects its importance, and these weights are automatically computed based on rules designed by LLMs. The LLM generates a rule set in advance, derived from the task description, and during training, it selects a weight calculation rule from the library based on language prompts that evaluate the performance of each branch. Experimental results demonstrate that the AHRS method achieves an average 6.48% performance improvement across multiple high-degree-of-freedom robotic tasks.
Point Cloud Recombination: Systematic Real Data Augmentation Using Robotic Targets for LiDAR Perception Validation
The validation of LiDAR-based perception of intelligent mobile systems operating in open-world applications remains a challenge due to the variability of real environmental conditions. Virtual simulations allow the generation of arbitrary scenes under controlled conditions but lack physical sensor characteristics, such as intensity responses or material-dependent effects. In contrast, real-world data offers true sensor realism but provides less control over influencing factors, hindering sufficient validation. Existing approaches address this problem with augmentation of real-world point cloud data by transferring objects between scenes. However, these methods do not consider validation and remain limited in controllability because they rely on empirical data. We solve these limitations by proposing Point Cloud Recombination, which systematically augments captured point cloud scenes by integrating point clouds acquired from physical target objects measured in controlled laboratory environments. Thus enabling the creation of vast amounts and varieties of repeatable, physically accurate test scenes with respect to phenomena-aware occlusions with registered 3D meshes. Using the Ouster OS1-128 Rev7 sensor, we demonstrate the augmentation of real-world urban and rural scenes with humanoid targets featuring varied clothing and poses, for repeatable positioning. We show that the recombined scenes closely match real sensor outputs, enabling targeted testing, scalable failure analysis, and improved system safety. By providing controlled yet sensor-realistic data, our method enables trustworthy conclusions about the limitations of specific sensors in compound with their algorithms, e.g., object detection.
comment: Pre-print for IEEE IAVVC 2025
ZeloS -- A Research Platform for Early-Stage Validation of Research Findings Related to Automated Driving
This paper presents ZeloS, a research platform designed and built for practical validation of automated driving methods in an early stage of research. We overview ZeloS' hardware setup and automation architecture and focus on motion planning and control. ZeloS weighs 69 kg, measures a length of 117 cm, and is equipped with all-wheel steering, all-wheel drive, and various onboard sensors for localization. The hardware setup and the automation architecture of ZeloS are designed and built with a focus on modularity and the goal of being simple yet effective. The modular design allows the modification of individual automation modules without the need for extensive onboarding into the automation architecture. As such, this design supports ZeloS in being a versatile research platform for validating various automated driving methods. The motion planning component and control of ZeloS feature optimization-based methods that allow for explicitly considering constraints. We demonstrate the hardware and automation setup by presenting experimental data.
Quadrupedal Spine Control Strategies: Exploring Correlations Between System Dynamic Responses and Human Perspectives
Unlike their biological cousins, the majority of existing quadrupedal robots are constructed with rigid chassis. This results in motion that is either beetle-like or distinctly robotic, lacking the natural fluidity characteristic of mammalian movements. Existing literature on quadrupedal robots with spinal configurations primarily focuses on energy efficiency and does not consider the effects in human-robot interaction scenarios. Our contributions include an initial investigation into various trajectory generation strategies for a quadrupedal robot with a four degree of freedom spine, and an analysis on the effect that such methods have on human perception of gait naturalness compared to a fixed spine baseline. The strategies were evaluated using videos of walking, trotting and turning simulations. Among the four different strategies developed, the optimised time varying and the foot-tracking strategies were perceived to be more natural than the baseline in a randomised trial with 50 participants. Although none of the strategies demonstrated any energy efficiency improvements over the no-spine baseline, some showed greater footfall consistency at higher speeds. Given the greater likeability drawn from the more natural locomotion patterns, this type of robot displays potential for applications in social robot scenarios such as elderly care, where energy efficiency is not a primary concern.
comment: 27 pages, 13 figures
Estimating Commonsense Scene Composition on Belief Scene Graphs ICRA25
This work establishes the concept of commonsense scene composition, with a focus on extending Belief Scene Graphs by estimating the spatial distribution of unseen objects. Specifically, the commonsense scene composition capability refers to the understanding of the spatial relationships among related objects in the scene, which in this article is modeled as a joint probability distribution for all possible locations of the semantic object class. The proposed framework includes two variants of a Correlation Information (CECI) model for learning probability distributions: (i) a baseline approach based on a Graph Convolutional Network, and (ii) a neuro-symbolic extension that integrates a spatial ontology based on Large Language Models (LLMs). Furthermore, this article provides a detailed description of the dataset generation process for such tasks. Finally, the framework has been validated through multiple runs on simulated data, as well as in a real-world indoor environment, demonstrating its ability to spatially interpret scenes across different room types.
comment: Accepted at ICRA25
A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints
We present a real-time safety filter for motion planning, such as learning-based methods, using Control Barrier Functions (CBFs), which provides formal guarantees for collision avoidance with road boundaries. A key feature of our approach is its ability to directly incorporate road geometries of arbitrary shape without resorting to conservative overapproximations. We formulate the safety filter as a constrained optimization problem in the form of a Quadratic Program (QP). It achieves safety by making minimal, necessary adjustments to the control actions issued by the nominal motion planner. We validate our safety filter through extensive numerical experiments across a variety of traffic scenarios featuring complex roads. The results confirm its reliable safety and high computational efficiency (execution frequency up to 40 Hz). Code & Video Demo: github.com/bassamlab/SigmaRL
MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans CVPR 2025
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce Scan2Sim, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate MetaScenes: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm MetaScene's potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research. Project website: https://meta-scenes.github.io/.
comment: CVPR 2025
Riemannian Direct Trajectory Optimization of Rigid Bodies on Matrix Lie Groups
Designing dynamically feasible trajectories for rigid bodies is a fundamental problem in robotics. Although direct trajectory optimization is widely applied to solve this problem, inappropriate parameterizations of rigid body dynamics often result in slow convergence and violations of the intrinsic topological structure of the rotation group. This paper introduces a Riemannian optimization framework for direct trajectory optimization of rigid bodies. We first use the Lie Group Variational Integrator to formulate the discrete rigid body dynamics on matrix Lie groups. We then derive the closed-form first- and second-order Riemannian derivatives of the dynamics. Finally, this work applies a line-search Riemannian Interior Point Method (RIPM) to perform trajectory optimization with general nonlinear constraints. As the optimization is performed on matrix Lie groups, it is correct-by-construction to respect the topological structure of the rotation group and be free of singularities. The paper demonstrates that both the derivative evaluations and Newton steps required to solve the RIPM exhibit linear complexity with respect to the planning horizon and system degrees of freedom. Simulation results illustrate that the proposed method is faster than conventional methods by an order of magnitude in challenging robotics tasks.
comment: Accepted to Robotics: Science and Systems (RSS) 2025
Sim2Real Transfer for Vision-Based Grasp Verification
The verification of successful grasps is a crucial aspect of robot manipulation, particularly when handling deformable objects. Traditional methods relying on force and tactile sensors often struggle with deformable and non-rigid objects. In this work, we present a vision-based approach for grasp verification to determine whether the robotic gripper has successfully grasped an object. Our method employs a two-stage architecture; first YOLO-based object detection model to detect and locate the robot's gripper and then a ResNet-based classifier determines the presence of an object. To address the limitations of real-world data capture, we introduce HSR-GraspSynth, a synthetic dataset designed to simulate diverse grasping scenarios. Furthermore, we explore the use of Visual Question Answering capabilities as a zero-shot baseline to which we compare our model. Experimental results demonstrate that our approach achieves high accuracy in real-world environments, with potential for integration into grasping pipelines. Code and datasets are publicly available at https://github.com/pauamargant/HSR-GraspSynth .
comment: Accepted at Austrian Robotics Workshop 2025
A Modal-Space Formulation for Momentum Observer Contact Estimation and Effects of Uncertainty for Continuum Robots
Contact detection for continuum and soft robots has been limited in past works to statics or kinematics-based methods with assumed circular bending curvature or known bending profiles. In this paper, we adapt the generalized momentum observer contact estimation method to continuum robots. This is made possible by leveraging recent results for real-time shape sensing of continuum robots along with a modal-space representation of the robot dynamics. In addition to presenting an approach for estimating the generalized forces due to contact via a momentum observer, we present a constrained optimization method to identify the wrench imparted on the robot during contact. We also present an approach for investigating the effects of unmodeled deviations in the robot's dynamic state on the contact detection method and we validate our algorithm by simulations and experiments. We also compare the performance of the momentum observer to the joint force deviation method, a direct estimation approach using the robot's full dynamic model. We also demonstrate a basic extension of the method to multisegment continuum robots. Results presented in this work extend dynamic contact detection to the domain of continuum and soft robots and can be used to improve the safety of large-scale continuum robots for human-robot collaboration.
MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning
Autonomous long-horizon mobile manipulation encompasses a multitude of challenges, including scene dynamics, unexplored areas, and error recovery. Recent works have leveraged foundation models for scene-level robotic reasoning and planning. However, the performance of these methods degrades when dealing with a large number of objects and large-scale environments. To address these limitations, we propose MORE, a novel approach for enhancing the capabilities of language models to solve zero-shot mobile manipulation planning for rearrangement tasks. MORE leverages scene graphs to represent environments, incorporates instance differentiation, and introduces an active filtering scheme that extracts task-relevant subgraphs of object and region instances. These steps yield a bounded planning problem, effectively mitigating hallucinations and improving reliability. Additionally, we introduce several enhancements that enable planning across both indoor and outdoor environments. We evaluate MORE on 81 diverse rearrangement tasks from the BEHAVIOR-1K benchmark, where it becomes the first approach to successfully solve a significant share of the benchmark, outperforming recent foundation model-based approaches. Furthermore, we demonstrate the capabilities of our approach in several complex real-world tasks, mimicking everyday activities. We make the code publicly available at https://more-model.cs.uni-freiburg.de.
Zero-shot Sim2Real Transfer for Magnet-Based Tactile Sensor on Insertion Tasks
Tactile sensing is an important sensing modality for robot manipulation. Among different types of tactile sensors, magnet-based sensors, like u-skin, balance well between high durability and tactile density. However, the large sim-to-real gap of tactile sensors prevents robots from acquiring useful tactile-based manipulation skills from simulation data, a recipe that has been successful for achieving complex and sophisticated control policies. Prior work has implemented binarization techniques to bridge the sim-to-real gap for dexterous in-hand manipulation. However, binarization inherently loses much information that is useful in many other tasks, e.g., insertion. In our work, we propose GCS, a novel sim-to-real technique to learn contact-rich skills with dense, distributed, 3-axis tactile readings. We evaluate our approach on blind insertion tasks and show zero-shot sim-to-real transfer of RL policies with raw tactile reading as input.
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety and actively regulate environmental coupling, allowing, for instance, safe object manipulation under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
RobustDexGrasp: Robust Dexterous Grasping of General Objects
The ability to robustly grasp a variety of objects is essential for dexterous robots. In this paper, we present a framework for zero-shot dynamic dexterous grasping using single-view visual inputs, designed to be resilient to various disturbances. Our approach utilizes a hand-centric object shape representation based on dynamic distance vectors between finger joints and object surfaces. This representation captures the local shape around potential contact regions rather than focusing on detailed global object geometry, thereby enhancing generalization to shape variations and uncertainties. To address perception limitations, we integrate a privileged teacher policy with a mixed curriculum learning approach, allowing the student policy to effectively distill grasping capabilities and explore for adaptation to disturbances. Trained in simulation, our method achieves success rates of 97.0% across 247,786 simulated objects and 94.6% across 512 real objects, demonstrating remarkable generalization. Quantitative and qualitative results validate the robustness of our policy against various disturbances.
comment: Project Page: https://zdchan.github.io/Robust_DexGrasp/
Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots
Hierarchical reinforcement learning (HRL) is hypothesized to be able to take advantage of the inherent hierarchy in robot learning tasks with sparse reward schemes, in contrast to more traditional reinforcement learning algorithms. In this research, hierarchical reinforcement learning is evaluated and contrasted with standard reinforcement learning in complex navigation tasks. We evaluate unique characteristics of HRL, including their ability to create sub-goals and the termination function. We constructed experiments to test the differences between PPO and HRL, different ways of creating sub-goals, manual vs automatic sub-goal creation, and the effects of the frequency of termination on performance. These experiments highlight the advantages of HRL and how it achieves these advantages.
Generative Trajectory Stitching through Diffusion Composition
Effective trajectory stitching for long-horizon planning is a significant challenge in robotic decision-making. While diffusion models have shown promise in planning, they are limited to solving tasks similar to those seen in their training data. We propose CompDiffuser, a novel generative approach that can solve new tasks by learning to compositionally stitch together shorter trajectory chunks from previously seen tasks. Our key insight is modeling the trajectory distribution by subdividing it into overlapping chunks and learning their conditional relationships through a single bidirectional diffusion model. This allows information to propagate between segments during generation, ensuring physically consistent connections. We conduct experiments on benchmark tasks of various difficulties, covering different environment sizes, agent state dimension, trajectory types, training data quality, and show that CompDiffuser significantly outperforms existing methods.
comment: Project page: https://comp-diffuser.github.io/
Spatio-Tempora Metric-Semantic Mapping for Persistent Orchard Monitoring: Method and Dataset
Monitoring orchards at the individual tree or fruit level throughout the growth season is crucial for plant phenotyping and horticultural resource optimization, such as chemical use and yield estimation. We present a 4D spatio-temporal metric-semantic mapping system that integrates multi-session measurements to track fruit growth over time. Our approach combines a LiDAR-RGB fusion module for 3D fruit localization with a 4D fruit association method leveraging positional, visual, and topology information for improved data association precision. Evaluated on real orchard data, our method achieves a 96.9% fruit counting accuracy for 1,790 apples across 60 trees, a mean fruit size estimation error of 1.1 cm, and a 23.7% improvement in 4D data association precision over baselines. We publicly release a multimodal dataset covering five fruit species across their growth seasons. https://4d-metric-semantic-mapping.org/
Analysis of the Unscented Transform for Cooperative Localization with Ranging-Only Information
Cooperative localization in multi-agent robotic systems is challenging, especially when agents rely on limited information, such as only peer-to-peer range measurements. Two key challenges arise: utilizing this limited information to improve position estimation; handling uncertainties from sensor noise, nonlinearity, and unknown correlations between agents measurements; and avoiding information reuse. This paper examines the use of the Unscented Transform (UT) for state estimation for a case in which range measurement between agents and covariance intersection (CI) is used to handle unknown correlations. Unlike Kalman Filter approaches, CI methods fuse complete state and covariance estimates. This makes formulating a CI approach with ranging-only measurements a challenge. To overcome this, UT is used to handle uncertainties and formulate a cooperative state update using range measurements and current cooperative state estimates. This introduces information reuse in the measurement update. Therefore, this work aims to evaluate the limitations and utility of this formulation when faced with various levels of state measurement uncertainty and errors.
comment: 8 pages, 8 figures. The paper will be presented at the 2025 IEEE/ION Position, Location and Navigation Symposium (PLANS)
High and Low Resolution Tradeoffs in Roadside Multimodal Sensing
Balancing cost and performance is crucial when choosing high- versus low-resolution point-cloud roadside sensors. For example, LiDAR delivers dense point cloud, while 4D millimeter-wave radar, though spatially sparser, embeds velocity cues that help distinguish objects and come at a lower price. Unfortunately, the sensor placement strategies will influence point cloud density and distribution across the coverage area. Compounding the first challenge is the fact that different sensor mixtures often demand distinct neural network architectures to maximize their complementary strengths. Without an evaluation framework that establishes a benchmark for comparison, it is imprudent to make claims regarding whether marginal gains result from higher resolution and new sensing modalities or from the algorithms. We present an ex-ante evaluation that addresses the two challenges. First, we realized a simulation tool that builds on integer programming to automatically compare different sensor placement strategies against coverage and cost jointly. Additionally, inspired by human multi-sensory integration, we propose a modular framework to assess whether reductions in spatial resolution can be compensated by informational richness in detecting traffic participants. Extensive experimental testing on the proposed framework shows that fusing velocity-encoded radar with low-resolution LiDAR yields marked gains (14 percent AP for pedestrians and an overall mAP improvement of 1.5 percent across six categories) at lower cost than high-resolution LiDAR alone. Notably, these marked gains hold regardless of the specific deep neural modules employed in our frame. The result challenges the prevailing assumption that high resolution are always superior to low-resolution alternatives.
comment: 8 pages, 7 figures
AC-LIO: Towards Asymptotic and Consistent Convergence in LiDAR-Inertial Odometry
Existing LiDAR-Inertial Odometry (LIO) methods typically utilize the prior state trajectory derived from the IMU integration to compensate for the motion distortion within LiDAR frames. However, discrepancies between the prior and actual trajectory can lead to residual distortions that compromise the consistency of the LiDAR frame with its corresponding geometric environment. This imbalance may result in pointcloud registration becoming trapped in local optima, thereby exacerbating drift during long-term and large-scale localization. To address the issue, we propose a novel asymptotically and consistently converging LIO framework dubbed AC-LIO. Our key idea is to back propagate current update term based on the prior state chain, and asymptotically compensate for the residual distortion during iteration. Moreover, considering the weak correlation between previous error and current distortion, we establish convergence criteria based on the pointcloud constraints to regulate the backpropagation. This method of guiding asymptotic distortion compensation using convergence criteria subtly enhances the consistent convergence of pointcloud registration, futher improving the accuracy and robustness of LIO system. Extensive experiments demonstrate that our AC-LIO framework significantly promotes consistent convergence in state estimation compared to prior arts, with about 30.4% reduction in average RMSE over the second best result, leading to marked improvements in the accuracy of long-term and large-scale localization and mapping.
comment: 8 pages, 6 figures
In vivo validation of Wireless Power Transfer System for Magnetically Controlled Robotic Capsule Endoscopy
This paper presents the in vivo validation of an inductive wireless power transfer (WPT) system integrated for the first time into a magnetically controlled robotic capsule endoscopy platform. The proposed system enables continuous power delivery to the capsule without the need for onboard batteries, thus extending operational time and reducing size constraints. The WPT system operates through a resonant inductive coupling mechanism, based on a transmitting coil mounted on the end effector of a robotic arm that also houses an external permanent magnet and a localization coil for precise capsule manipulation. To ensure robust and stable power transmission in the presence of coil misalignment and rotation, a 3D receiving coil is integrated within the capsule. Additionally, a closed-loop adaptive control system, based on load-shift keying (LSK) modulation, dynamically adjusts the transmitted power to optimize efficiency while maintaining compliance with specific absorption rate (SAR) safety limits. The system has been extensively characterized in laboratory settings and validated through in vivo experiments using a porcine model, demonstrating reliable power transfer and effective robotic navigation in realistic gastrointestinal conditions: the average received power was 110 mW at a distance of 9 cm between the coils, with variable capsule rotation angles. The results confirm the feasibility of the proposed WPT approach for autonomous, battery-free robotic capsule endoscopy, paving the way for enhanced diagnostic in gastrointestinal medicine.
comment: 9 pages, 9 figures, regular paper
Inverse Dynamics Trajectory Optimization for Contact-Implicit Model Predictive Control
Robots must make and break contact with the environment to perform useful tasks, but planning and control through contact remains a formidable challenge. In this work, we achieve real-time contact-implicit model predictive control with a surprisingly simple method: inverse dynamics trajectory optimization. While trajectory optimization with inverse dynamics is not new, we introduce a series of incremental innovations that collectively enable fast model predictive control on a variety of challenging manipulation and locomotion tasks. We implement these innovations in an open-source solver and present simulation examples to support the effectiveness of the proposed approach. Additionally, we demonstrate contact-implicit model predictive control on hardware at over 100 Hz for a 20-degree-of-freedom bi-manual manipulation task. Video and code are available at https://idto.github.io.
comment: IJRR accepted version
Smoothing of Headland Path Edges and Headland-to-Mainfield Lane Transitions Based on a Spatial Domain Transformation and Linear Programming
Within the context of in-field path planning and under the assumption of nonholonomic vehicle models this paper addresses two tasks: smoothing of headland path edges and smoothing of headland-to-mainfield lane transitions. Both tasks are solved by a two-step hierarchical algorithm. The first step differs for the two tasks generating either a piecewise-affine or a Dubins reference path. The second step leverages a transformation of vehicle dynamics from the time domain into the spatial domain and linear programming. Benefits such as a hyperparameter-free objective function and spatial constraints useful for area coverage gaps avoidance and precision path planning are discussed. The method, which is a deterministic optimisation-based method, is evaluated on 5 real-world fields solving 19 instances of the first task and 84 instances of the second task.
comment: 13 pages, 12 figures, 4 tables
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Contact-rich manipulation plays an important role in daily human activities. However, uncertain physical parameters often pose significant challenges for both planning and control. A promising strategy is to develop policies that are robust across a wide range of parameters. Domain adaptation and domain randomization are widely used, but they tend to either limit generalization to new instances or perform conservatively due to neglecting instance-specific information. \textit{Explicit motor adaptation} addresses these issues by estimating system parameters online and then retrieving the parameter-conditioned policy from a parameter-augmented base policy. However, it typically requires precise system identification or additional training of a student policy, both of which are challenging in contact-rich manipulation tasks with diverse physical parameters. In this work, we propose \textit{implicit motor adaptation}, which enables parameter-conditioned policy retrieval given a roughly estimated parameter distribution instead of a single estimate. We leverage tensor train as an implicit representation of the base policy, facilitating efficient retrieval of the parameter-conditioned policy by exploiting the separable structure of tensor cores. This framework eliminates the need for precise system estimation and policy retraining while preserving optimal behavior and strong generalization. We provide a theoretical analysis to validate the approach, supported by numerical evaluations on three contact-rich manipulation primitives. Both simulation and real-world experiments demonstrate its ability to generate robust policies across diverse instances. Project website: \href{https://sites.google.com/view/implicit-ma}{https://sites.google.com/view/implicit-ma}.
When Pre-trained Visual Representations Fall Short: Limitations in Visuo-Motor Robot Learning
The integration of pre-trained visual representations (PVRs) into visuo-motor robot learning has emerged as a promising alternative to training visual encoders from scratch. However, PVRs face critical challenges in the context of policy learning, including temporal entanglement and an inability to generalise even in the presence of minor scene perturbations. These limitations hinder performance in tasks requiring temporal awareness and robustness to scene changes. This work identifies these shortcomings and proposes solutions to address them. First, we augment PVR features with temporal perception and a sense of task completion, effectively disentangling them in time. Second, we introduce a module that learns to selectively attend to task-relevant local features, enhancing robustness when evaluated on out-of-distribution scenes. Our experiments demonstrate significant performance improvements, particularly in PVRs trained with masking objectives, and validate the effectiveness of our enhancements in addressing PVR-specific limitations.
comment: Project Page: https://tsagkas.github.io/pvrobo/
ForesightNav: Learning Scene Imagination for Efficient Exploration
Understanding how humans leverage prior knowledge to navigate unseen environments while making exploratory decisions is essential for developing autonomous robots with similar abilities. In this work, we propose ForesightNav, a novel exploration strategy inspired by human imagination and reasoning. Our approach equips robotic agents with the capability to predict contextual information, such as occupancy and semantic details, for unexplored regions. These predictions enable the robot to efficiently select meaningful long-term navigation goals, significantly enhancing exploration in unseen environments. We validate our imagination-based approach using the Structured3D dataset, demonstrating accurate occupancy prediction and superior performance in anticipating unseen scene geometry. Our experiments show that the imagination module improves exploration efficiency in unseen environments, achieving a 100% completion rate for PointNav and an SPL of 67% for ObjectNav on the Structured3D Validation split. These contributions demonstrate the power of imagination-driven reasoning for autonomous systems to enhance generalizable and efficient exploration.
A Model Predictive Capture Point Control Framework for Robust Humanoid Balancing via Ankle, Hip, and Stepping Strategies
The robust balancing capability of humanoids is essential for mobility in real environments. Many studies focus on implementing human-inspired ankle, hip, and stepping strategies to achieve human-level balance. In this paper, a robust balance control framework for humanoids is proposed. Firstly, a Model Predictive Control (MPC) framework is proposed for Capture Point (CP) tracking control, enabling the integration of ankle, hip, and stepping strategies within a single framework. Additionally, a variable weighting method is introduced that adjusts the weighting parameters of the Centroidal Angular Momentum damping control. Secondly, a hierarchical structure of the MPC and a stepping controller was proposed, allowing for the step time optimization. The robust balancing performance of the proposed method is validated through simulations and real robot experiments. Furthermore, a superior balancing performance is demonstrated compared to a state-of-the-art Quadratic Programming-based CP controller that employs the ankle, hip, and stepping strategies.
comment: 20 pages,13 figures
Variational OOD State Correction for Offline Reinforcement Learning
The performance of Offline reinforcement learning is significantly impacted by the issue of state distributional shift, and out-of-distribution (OOD) state correction is a popular approach to address this problem. In this paper, we propose a novel method named Density-Aware Safety Perception (DASP) for OOD state correction. Specifically, our method encourages the agent to prioritize actions that lead to outcomes with higher data density, thereby promoting its operation within or the return to in-distribution (safe) regions. To achieve this, we optimize the objective within a variational framework that concurrently considers both the potential outcomes of decision-making and their density, thus providing crucial contextual information for safe decision-making. Finally, we validate the effectiveness and feasibility of our proposed method through extensive experimental evaluations on the offline MuJoCo and AntMaze suites.
PhysicsFC: Learning User-Controlled Skills for a Physics-Based Football Player Controller SIGGRAPH 2025
We propose PhysicsFC, a method for controlling physically simulated football player characters to perform a variety of football skills--such as dribbling, trapping, moving, and kicking--based on user input, while seamlessly transitioning between these skills. Our skill-specific policies, which generate latent variables for each football skill, are trained using an existing physics-based motion embedding model that serves as a foundation for reproducing football motions. Key features include a tailored reward design for the Dribble policy, a two-phase reward structure combined with projectile dynamics-based initialization for the Trap policy, and a Data-Embedded Goal-Conditioned Latent Guidance (DEGCL) method for the Move policy. Using the trained skill policies, the proposed football player finite state machine (PhysicsFC FSM) allows users to interactively control the character. To ensure smooth and agile transitions between skill policies, as defined in the FSM, we introduce the Skill Transition-Based Initialization (STI), which is applied during the training of each skill policy. We develop several interactive scenarios to showcase PhysicsFC's effectiveness, including competitive trapping and dribbling, give-and-go plays, and 11v11 football games, where multiple PhysicsFC agents produce natural and controllable physics-based football player behaviors. Quantitative evaluations further validate the performance of individual skill policies and the transitions between them, using the presented metrics and experimental designs.
comment: Accepted to SIGGRAPH 2025 and ACM TOG
Towards Uncertainty Unification: A Case Study for Preference Learning
Learning human preferences is essential for human-robot interaction, as it enables robots to adapt their behaviors to align with human expectations and goals. However, the inherent uncertainties in both human behavior and robotic systems make preference learning a challenging task. While probabilistic robotics algorithms offer uncertainty quantification, the integration of human preference uncertainty remains underexplored. To bridge this gap, we introduce uncertainty unification and propose a novel framework, uncertainty-unified preference learning (UUPL), which enhances Gaussian Process (GP)-based preference learning by unifying human and robot uncertainties. Specifically, UUPL includes a human preference uncertainty model that improves GP posterior mean estimation, and an uncertainty-weighted Gaussian Mixture Model (GMM) that enhances GP predictive variance accuracy. Additionally, we design a user-specific calibration process to align uncertainty representations across users, ensuring consistency and reliability in the model performance. Comprehensive experiments and user studies demonstrate that UUPL achieves state-of-the-art performance in both prediction accuracy and user rating. An ablation study further validates the effectiveness of human uncertainty model and uncertainty-weighted GMM of UUPL.
comment: Project page: https://sites.google.com/view/uupl-rss25/home
Reward-Based Collision-Free Algorithm for Trajectory Planning of Autonomous Robots
This paper proposes a novel mission planning algorithm for autonomous robots that selects an optimal waypoint sequence from a predefined set to maximize total reward while satisfying obstacle avoidance, state, input, derivative, mission time, and distance constraints. The formulation extends the prize-collecting traveling salesman problem. A tailored genetic algorithm evolves candidate solutions using a fitness function, crossover, and mutation, with constraint enforcement via a penalty method. Differential flatness and clothoid curves are employed to penalize infeasible trajectories efficiently, while the Euler spiral method ensures curvature-continuous trajectories with bounded curvature, enhancing dynamic feasibility and mitigating oscillations typical of minimum-jerk and snap parameterizations. Due to the discrete variable length optimization space, crossover is performed using a dynamic time-warping-based method and extended convex combination with projection. The algorithm's performance is validated through simulations and experiments with a ground vehicle, quadrotor, and quadruped, supported by benchmarking and time-complexity analysis.
Flow-based Domain Randomization for Learning and Sequencing Robotic Skills
Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies trained in simulation. By randomizing environment properties during training, the learned policy can become robust to uncertainties along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate automatically discovering a sampling distribution via entropy-regularized reward maximization of a normalizing-flow-based neural sampling distribution. We show that this architecture is more flexible and provides greater robustness than existing approaches that learn simpler, parameterized sampling distributions, as demonstrated in six simulated and one real-world robotics domain. Lastly, we explore how these learned sampling distributions, combined with a privileged value function, can be used for out-of-distribution detection in an uncertainty-aware multi-step manipulation planner.
Multiagent Systems
Beyond the model: Key differentiators in large language models and multi-agent services
With the launch of foundation models like DeepSeek, Manus AI, and Llama 4, it has become evident that large language models (LLMs) are no longer the sole defining factor in generative AI. As many now operate at comparable levels of capability, the real race is not about having the biggest model but optimizing the surrounding ecosystem, including data quality and management, computational efficiency, latency, and evaluation frameworks. This review article delves into these critical differentiators that ensure modern AI services are efficient and profitable.
comment: 4 pages
El Agente: An Autonomous Agent for Quantum Chemistry
Computational chemistry tools are widely used to study the behaviour of chemical phenomena. Yet, the complexity of these tools can make them inaccessible to non-specialists and challenging even for experts. In this work, we introduce El Agente Q, an LLM-based multi-agent system that dynamically generates and executes quantum chemistry workflows from natural language user prompts. The system is built on a novel cognitive architecture featuring a hierarchical memory framework that enables flexible task decomposition, adaptive tool selection, post-analysis, and autonomous file handling and submission. El Agente Q is benchmarked on six university-level course exercises and two case studies, demonstrating robust problem-solving performance (averaging >87% task success) and adaptive error handling through in situ debugging. It also supports longer-term, multi-step task execution for more complex workflows, while maintaining transparency through detailed action trace logs. Together, these capabilities lay the foundation for increasingly autonomous and accessible quantum chemistry.
The Cognitive Foundations of Economic Exchange: A Modular Framework Grounded in Behavioral Evidence
A key challenge in multi-agent AI is modeling social cooperation under realistic behavioral constraints. Many foundational concepts in economics and ethics such as "trust" or "morality" are often defined informally, without operational criteria or cognitive grounding, which limits their testability and implementation in artificial agents. Drawing on converging empirical evidence from primate behavior, infant cognition, and economic anthropology, we propose a conceptual framework composed of three cognitively minimal mechanisms: individual recognition, reciprocal credence, and cost return sensitivity. This framework reframes trust as a graded cognitive expectation, providing a simulateable basis for reciprocal exchange in artificial agents, and enabling the bottom-up emergence of scalable cooperation and institutional dynamics.
comment: This is a position paper. Theoretical framework is finalized; minor language revisions are ongoing. Submitted for feedback and public discussion
Moving From Monolithic To Microservices Architecture for Multi-Agent Systems
The transition from monolithic to microservices architecture revolutionized software development by improving scalability and maintainability. This paradigm shift is now becoming relevant for complex multi-agent systems (MAS). This review article explores the evolution from monolithic architecture to microservices architecture in the specific context of MAS. It will highlight the limitations of traditional monolithic MAS and the benefits of adopting a microservices-based approach. The article further examines the core architectural principles and communication protocols, including Agent Communication Languages (ACLs), the Model Context Protocol (MCP), and the Application-to-Application (A2A) protocol. The article identifies emerging architectural patterns, design challenges, and considerations through a comparative lens of the paradigm shift.
comment: 5 pages, comparative analysis
Modeling AI-Human Collaboration as a Multi-Agent Adaptation
We develop an agent-based simulation to formalize AI-human collaboration as a function of task structure, advancing a generalizable framework for strategic decision-making in organizations. Distinguishing between heuristic-based human adaptation and rule-based AI search, we model interactions across modular (parallel) and sequenced (interdependent) tasks using an NK model. Our results reveal that in modular tasks, AI often substitutes for humans - delivering higher payoffs unless human expertise is very high, and the AI search space is either narrowly focused or extremely broad. In sequenced tasks, interesting complementarities emerge. When an expert human initiates the search and AI subsequently refines it, aggregate performance is maximized. Conversely, when AI leads, excessive heuristic refinement by the human can reduce payoffs. We also show that even "hallucinatory" AI - lacking memory or structure - can improve outcomes when augmenting low-capability humans by helping escape local optima. These results yield a robust implication: the effectiveness of AI-human collaboration depends less on context or industry, and more on the underlying task structure. By elevating task decomposition as the central unit of analysis, our model provides a transferable lens for strategic decision-making involving humans and an agentic AI across diverse organizational settings.
Systems and Control (CS)
Structured Estimators: A New Perspective on Information Freshness
In recent literature, when modeling for information freshness in remote estimation settings, estimators have been mainly restricted to the class of martingale estimators, meaning the remote estimate at any time is equal to the most recently received update. This is mainly due to its simplicity and ease of analysis. However, these martingale estimators are far from optimal in some cases, especially in pull-based update systems. For such systems, maximum aposteriori probability (MAP) estimators are optimum, but can be challenging to analyze. Here, we introduce a new class of estimators, called structured estimators, which retain useful characteristics from a MAP estimate while still being analytically tractable. Our proposed estimators move seamlessly from a martingale estimator to a MAP estimator.
Online Phase Estimation of Human Oscillatory Motions using Deep Learning
Accurately estimating the phase of oscillatory systems is essential for analyzing cyclic activities such as repetitive gestures in human motion. In this work we introduce a learning-based approach for online phase estimation in three-dimensional motion trajectories, using a Long Short- Term Memory (LSTM) network. A calibration procedure is applied to standardize trajectory position and orientation, ensuring invariance to spatial variations. The proposed model is evaluated on motion capture data and further tested in a dynamical system, where the estimated phase is used as input to a reinforcement learning (RL)-based control to assess its impact on the synchronization of a network of Kuramoto oscillators.
Wise Goose Chase: A Predictive Path Planning Algorithm for Dynamic Rebalancing in Ride-Hailing Systems
Traditional rebalancing methods in ride-hailing systems direct idle drivers to fixed destinations, overlooking the fact that ride allocations frequently occur while cruising. This destination-centric view fails to exploit the path-dependent nature of modern platforms, where real-time matching depends on the entire trajectory rather than a static endpoint. We propose the Wise Goose Chase (WGC) algorithm, an event-triggered, driver-specific path planning framework that anticipates future matching opportunities by forecasting spatio-temporal supply and demand dynamics. WGC uses a system of Retarded Functional Differential Equations (RFDEs) to model the evolution of idle driver density and passenger queues at the road-segment level, incorporating both en-route matching and competition among drivers. Upon request, WGC computes personalized cruising paths that minimize each driver's expected time to allocation. Monte Carlo simulations on synthetic urban networks show that WGC consistently outperforms baseline strategies, highlighting the advantage of predictive, context-aware rebalancing in dynamic mobility systems.
Maximal Compatibility Matching for Preference-Aware Ride-Hailing Systems
This paper presents the Maximal Compatibility Matching (MCM) framework, a novel assignment strategy for ride-hailing systems that explicitly incorporates passenger comfort into the matching process. Traditional assignment methods prioritize spatial efficiency, but often overlook behavioral alignment between passengers and drivers, which can significantly impact user satisfaction. MCM addresses this gap by learning personalized passenger comfort zones using gradient-boosted decision tree classifiers trained on labeled ride data, and by modeling driver behavior through empirical operating profiles constructed from time-series driving features. Compatibility between a passenger and a driver is computed as the closed-form volume of intersection between their respective feature-space regions. These compatibility scores are integrated into a utility-based matching algorithm that balances comfort and proximity through a tunable trade-off parameter. We validate the framework using a Unity-based driving simulator with real-time passenger feedback, demonstrating that MCM enables more personalized and socially acceptable matchings while maintaining high levels of operational performance.
LiDAR-Inertial SLAM-Based Navigation and Safety-Oriented AI-Driven Control System for Skid-Steer Robots
Integrating artificial intelligence (AI) and stochastic technologies into the mobile robot navigation and control (MRNC) framework while adhering to rigorous safety standards presents significant challenges. To address these challenges, this paper proposes a comprehensively integrated MRNC framework for skid-steer wheeled mobile robots (SSWMRs), in which all components are actively engaged in real-time execution. The framework comprises: 1) a LiDAR-inertial simultaneous localization and mapping (SLAM) algorithm for estimating the current pose of the robot within the built map; 2) an effective path-following control system for generating desired linear and angular velocity commands based on the current pose and the desired pose; 3) inverse kinematics for transferring linear and angular velocity commands into left and right side velocity commands; and 4) a robust AI-driven (RAID) control system incorporating a radial basis function network (RBFN) with a new adaptive algorithm to enforce in-wheel actuation systems to track each side motion commands. To further meet safety requirements, the proposed RAID control within the MRNC framework of the SSWMR constrains AI-generated tracking performance within predefined overshoot and steady-state error limits, while ensuring robustness and system stability by compensating for modeling errors, unknown RBF weights, and external forces. Experimental results verify the proposed MRNC framework performance for a 4,836 kg SSWMR operating on soft terrain.
comment: This paper has been submitted in the IEEE CDC 2025 for potential presentation
Learning and Online Replication of Grasp Forces from Electromyography Signals for Prosthetic Finger Control ICRA 2025
Partial hand amputations significantly affect the physical and psychosocial well-being of individuals, yet intuitive control of externally powered prostheses remains an open challenge. To address this gap, we developed a force-controlled prosthetic finger activated by electromyography (EMG) signals. The prototype, constructed around a wrist brace, functions as a supernumerary finger placed near the index, allowing for early-stage evaluation on unimpaired subjects. A neural network-based model was then implemented to estimate fingertip forces from EMG inputs, allowing for online adjustment of the prosthetic finger grip strength. The force estimation model was validated through experiments with ten participants, demonstrating its effectiveness in predicting forces. Additionally, online trials with four users wearing the prosthesis exhibited precise control over the device. Our findings highlight the potential of using EMG-based force estimation to enhance the functionality of prosthetic fingers.
comment: 7 pages, 6 figures, to be presented at ICRA 2025
Data-Driven Energy Modeling of Industrial IoT Systems: A Benchmarking Approach
The widespread adoption of IoT has driven the development of cyber-physical systems (CPS) in industrial environments, leveraging Industrial IoTs (IIoTs) to automate manufacturing processes and enhance productivity. The transition to autonomous systems introduces significant operational costs, particularly in terms of energy consumption. Accurate modeling and prediction of IIoT energy requirements are critical, but traditional physics- and engineering-based approaches often fall short in addressing these challenges comprehensively. In this paper, we propose a novel methodology for benchmarking and analyzing IIoT devices and applications to uncover insights into their power demands, energy consumption, and performance. To demonstrate this methodology, we develop a comprehensive framework and apply it to study an industrial CPS comprising an educational robotic arm, a conveyor belt, a smart camera, and a compute node. By creating micro-benchmarks and an end-to-end application within this framework, we create an extensive performance and power consumption dataset, which we use to train and analyze ML models for predicting energy usage from features of the application and the CPS system. The proposed methodology and framework provide valuable insights into the energy dynamics of industrial CPS, offering practical implications for researchers and practitioners aiming to enhance the efficiency and sustainability of IIoT-driven automation.
ZeloS -- A Research Platform for Early-Stage Validation of Research Findings Related to Automated Driving
This paper presents ZeloS, a research platform designed and built for practical validation of automated driving methods in an early stage of research. We overview ZeloS' hardware setup and automation architecture and focus on motion planning and control. ZeloS weighs 69 kg, measures a length of 117 cm, and is equipped with all-wheel steering, all-wheel drive, and various onboard sensors for localization. The hardware setup and the automation architecture of ZeloS are designed and built with a focus on modularity and the goal of being simple yet effective. The modular design allows the modification of individual automation modules without the need for extensive onboarding into the automation architecture. As such, this design supports ZeloS in being a versatile research platform for validating various automated driving methods. The motion planning component and control of ZeloS feature optimization-based methods that allow for explicitly considering constraints. We demonstrate the hardware and automation setup by presenting experimental data.
ReeM: Ensemble Building Thermodynamics Model for Efficient HVAC Control via Hierarchical Reinforcement Learning
The building thermodynamics model, which predicts real-time indoor temperature changes under potential HVAC (Heating, Ventilation, and Air Conditioning) control operations, is crucial for optimizing HVAC control in buildings. While pioneering studies have attempted to develop such models for various building environments, these models often require extensive data collection periods and rely heavily on expert knowledge, making the modeling process inefficient and limiting the reusability of the models. This paper explores a model ensemble perspective that utilizes existing developed models as base models to serve a target building environment, thereby providing accurate predictions while reducing the associated efforts. Given that building data streams are non-stationary and the number of base models may increase, we propose a Hierarchical Reinforcement Learning (HRL) approach to dynamically select and weight the base models. Our approach employs a two-tiered decision-making process: the high-level focuses on model selection, while the low-level determines the weights of the selected models. We thoroughly evaluate the proposed approach through offline experiments and an on-site case study, and the experimental results demonstrate the effectiveness of our method.
Impact of Transceiver Selection on Synchronization Accuracy in White Rabbit Networks
Achieving optimal synchronization accuracy between two White Rabbit devices hinges on the proper selection of transceivers, which act as electro-optical converters connecting WR devices to the optical network infrastructure. The correct choice of transceivers can significantly improve resilience to changes in the time offset between WR devices due to temperature fluctuations in the connecting optical fiber. To compare the performance of BiDi WDM and DWDM transceivers, an experimental setup was established under laboratory conditions to simulate a real optical network used for distributing precise time and frequency between two remote locations. The optical connection was emulated by integrating a 20 km G.652.D optical fiber into a climatic chamber, which provided variable environmental conditions similar to those experienced in real applications. The study compared BiDi WDM 1310/1550 nm transceivers with DWDM Ch33/Ch34 transceivers. Results showed that DWDM transceivers exhibited nearly thirteen times less sensitivity to temperature-induced changes in the optical connection, leading to a smaller time offset. Therefore, for achieving the highest accuracy in synchronizing WR devices in practical applications, DWDM transceiver technology is essential.
comment: 5 pages, 4 figures, conference paper
Quadrupedal Spine Control Strategies: Exploring Correlations Between System Dynamic Responses and Human Perspectives
Unlike their biological cousins, the majority of existing quadrupedal robots are constructed with rigid chassis. This results in motion that is either beetle-like or distinctly robotic, lacking the natural fluidity characteristic of mammalian movements. Existing literature on quadrupedal robots with spinal configurations primarily focuses on energy efficiency and does not consider the effects in human-robot interaction scenarios. Our contributions include an initial investigation into various trajectory generation strategies for a quadrupedal robot with a four degree of freedom spine, and an analysis on the effect that such methods have on human perception of gait naturalness compared to a fixed spine baseline. The strategies were evaluated using videos of walking, trotting and turning simulations. Among the four different strategies developed, the optimised time varying and the foot-tracking strategies were perceived to be more natural than the baseline in a randomised trial with 50 participants. Although none of the strategies demonstrated any energy efficiency improvements over the no-spine baseline, some showed greater footfall consistency at higher speeds. Given the greater likeability drawn from the more natural locomotion patterns, this type of robot displays potential for applications in social robot scenarios such as elderly care, where energy efficiency is not a primary concern.
comment: 27 pages, 13 figures
A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints
We present a real-time safety filter for motion planning, such as learning-based methods, using Control Barrier Functions (CBFs), which provides formal guarantees for collision avoidance with road boundaries. A key feature of our approach is its ability to directly incorporate road geometries of arbitrary shape without resorting to conservative overapproximations. We formulate the safety filter as a constrained optimization problem in the form of a Quadratic Program (QP). It achieves safety by making minimal, necessary adjustments to the control actions issued by the nominal motion planner. We validate our safety filter through extensive numerical experiments across a variety of traffic scenarios featuring complex roads. The results confirm its reliable safety and high computational efficiency (execution frequency up to 40 Hz). Code & Video Demo: github.com/bassamlab/SigmaRL
Riemannian Direct Trajectory Optimization of Rigid Bodies on Matrix Lie Groups
Designing dynamically feasible trajectories for rigid bodies is a fundamental problem in robotics. Although direct trajectory optimization is widely applied to solve this problem, inappropriate parameterizations of rigid body dynamics often result in slow convergence and violations of the intrinsic topological structure of the rotation group. This paper introduces a Riemannian optimization framework for direct trajectory optimization of rigid bodies. We first use the Lie Group Variational Integrator to formulate the discrete rigid body dynamics on matrix Lie groups. We then derive the closed-form first- and second-order Riemannian derivatives of the dynamics. Finally, this work applies a line-search Riemannian Interior Point Method (RIPM) to perform trajectory optimization with general nonlinear constraints. As the optimization is performed on matrix Lie groups, it is correct-by-construction to respect the topological structure of the rotation group and be free of singularities. The paper demonstrates that both the derivative evaluations and Newton steps required to solve the RIPM exhibit linear complexity with respect to the planning horizon and system degrees of freedom. Simulation results illustrate that the proposed method is faster than conventional methods by an order of magnitude in challenging robotics tasks.
comment: Accepted to Robotics: Science and Systems (RSS) 2025
Robustly Invertible Nonlinear Dynamics and the BiLipREN: Contracting Neural Models with Contracting Inverses
We study the invertibility of nonlinear dynamical systems from the perspective of contraction and incremental stability analysis and propose a new invertible recurrent neural model: the BiLipREN. In particular, we consider a nonlinear state space model to be robustly invertible if an inverse exists with a state space realisation, and both the forward model and its inverse are contracting, i.e. incrementally exponentially stable, and Lipschitz, i.e. have bounded incremental gain. This property of bi-Lipschitzness implies both robustness in the sense of sensitivity to input perturbations, as well as robust distinguishability of different inputs from their corresponding outputs, i.e. the inverse model robustly reconstructs the input sequence despite small perturbations to the initial conditions and measured output. Building on this foundation, we propose a parameterization of neural dynamic models: bi-Lipschitz recurrent equilibrium networks (biLipREN), which are robustly invertible by construction. Moreover, biLipRENs can be composed with orthogonal linear systems to construct more general bi-Lipschitz dynamic models, e.g., a nonlinear analogue of minimum-phase/all-pass (inner/outer) factorization. We illustrate the utility of our proposed approach with numerical examples.
$\mathcal{H}_2$-optimal model reduction of linear quadratic-output systems by multivariate rational interpolation
This paper addresses the $\mathcal{H}_2$-optimal approximation of linear dynamical systems with quadratic-output functions, also known as linear quadratic-output systems. Our major contributions are threefold. First, we derive interpolation-based first-order optimality conditions for the linear quadratic-output $\mathcal{H}_2$ minimization problem. These conditions correspond to the mixed-multipoint tangential interpolation of the full-order linear- and quadratic-output transfer functions, and generalize the Meier-Luenberger optimality framework for the $\mathcal{H}_2$-optimal model reduction of linear time-invariant systems. Second, given the interpolation data, we show how to enforce these mixed-multipoint tangential interpolation conditions explicitly by Petrov-Galerkin projection of the full-order model matrices. Third, to find the optimal interpolation data, we build on this projection framework and propose a generalization of the iterative rational Krylov algorithm for the $\mathcal{H}_2$-optimal model reduction of linear quadratic-output systems, called LQO-IRKA. Upon convergence, LQO-IRKA produces a reduced linear quadratic-output system that satisfies the interpolatory optimality conditions. The method only requires solving shifted linear systems and matrix-vector products, thus making it suitable for large-scale problems. Numerical examples are included to illustrate the effectiveness of the proposed method.
Computing in Integrated Terrestrial and Non-Terrestrial Networks: A Comprehensive Survey
The rapid growth of Internet-of-things (IoT) devices, smart vehicles, and other connected objects is driving demand for ubiquitous connectivity and intensive computing capacity. 5G and upcoming 6G networks are crucial to meeting these demands and the fast-evolving services and applications. However, traditional terrestrial networks face limitations in coverage and capacity. Integrated Terrestrial and Non-Terrestrial Networks (ITNTN) are emerging to address these challenges. In essence, ITNTN combines ground-based infrastructure with aerial, space, and water surface networks to provide seamless connectivity and computing resources anytime, anywhere. Given the stringent quality-of-service (QoS) of future services, edge computing will be an inseparable component of ITNTN. Consequently, we dive in this survey into current efforts of integrating cloud/fog/edge computing into ITNTN layers to facilitate stringent QoS services and address the data processing needs of modern applications. Since there have been only limited and partial efforts in integrating computing functionalities within ITNTN, we aim to extend the discussion to the full integration of computing and identifying the challenges and future research directions to achieve it.
comment: 20 pages, 7 figures, 9 tables. Submitted to IEEE Access
A Practical Approach Towards Inertia Estimation Using Ambient Synchrophasor Data
Real-time tracking of inertia is important because it reflects the power system's ability to withstand contingencies and maintain frequency security. This paper proposes a practical approach to estimate inertia using ambient phasor measurement unit (PMU) data and a partitioned form of the swing equation. The approach accounts for (bounded) uncertainties in network parameters and PMU measurements, enabling precise estimation of inertia and damping constants, as well as mechanical power inputs. Instead of assuming constant mechanical power input throughout, the approach leverages knowledge of power system operations to determine intervals when it is actually constant to maintain estimation consistency. Simulation results on the IEEE 14-bus system and IEEE 39 bus system integrated with renewable energy sources affirm the method's accuracy and applicability.
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety and actively regulate environmental coupling, allowing, for instance, safe object manipulation under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
In vivo validation of Wireless Power Transfer System for Magnetically Controlled Robotic Capsule Endoscopy
This paper presents the in vivo validation of an inductive wireless power transfer (WPT) system integrated for the first time into a magnetically controlled robotic capsule endoscopy platform. The proposed system enables continuous power delivery to the capsule without the need for onboard batteries, thus extending operational time and reducing size constraints. The WPT system operates through a resonant inductive coupling mechanism, based on a transmitting coil mounted on the end effector of a robotic arm that also houses an external permanent magnet and a localization coil for precise capsule manipulation. To ensure robust and stable power transmission in the presence of coil misalignment and rotation, a 3D receiving coil is integrated within the capsule. Additionally, a closed-loop adaptive control system, based on load-shift keying (LSK) modulation, dynamically adjusts the transmitted power to optimize efficiency while maintaining compliance with specific absorption rate (SAR) safety limits. The system has been extensively characterized in laboratory settings and validated through in vivo experiments using a porcine model, demonstrating reliable power transfer and effective robotic navigation in realistic gastrointestinal conditions: the average received power was 110 mW at a distance of 9 cm between the coils, with variable capsule rotation angles. The results confirm the feasibility of the proposed WPT approach for autonomous, battery-free robotic capsule endoscopy, paving the way for enhanced diagnostic in gastrointestinal medicine.
comment: 9 pages, 9 figures, regular paper
Data-Driven Reachability Analysis for Piecewise Affine Systems
Hybrid systems play a crucial role in modeling real-world applications where discrete and continuous dynamics interact, including autonomous vehicles, power systems, and traffic networks. Safety verification for these systems requires determining whether system states can enter unsafe regions under given initial conditions and uncertainties, a question directly addressed by reachability analysis. However, hybrid systems present unique difficulties because their state space is divided into multiple regions with distinct dynamic models, causing traditional data-driven methods to produce inadequate over-approximations of reachable sets at region boundaries where dynamics change abruptly. This paper introduces a novel approach using hybrid zonotopes for data-driven reachability analysis of piecewise affine systems. Our method addresses the boundary transition problem by developing computational algorithms that calculate the family of set models guaranteed to contain the true system trajectories. Additionally, we extend and evaluate three methods for set-based estimation that account for input-output data with measurement noise.
comment: 8 pages, 6 figures
Observability conditions for neural state-space models with eigenvalues and their roots of unity
We operate through the lens of ordinary differential equations and control theory to study the concept of observability in the context of neural state-space models and the Mamba architecture. We develop strategies to enforce observability, which are tailored to a learning context, specifically where the hidden states are learnable at initial time, in conjunction to over its continuum, and high-dimensional. We also highlight our methods emphasize eigenvalues, roots of unity, or both. Our methods effectuate computational efficiency when enforcing observability, sometimes at great scale. We formulate observability conditions in machine learning based on classical control theory and discuss their computational complexity. Our nontrivial results are fivefold. We discuss observability through the use of permutations in neural applications with learnable matrices without high precision. We present two results built upon the Fourier transform that effect observability with high probability up to the randomness in the learning. These results are worked with the interplay of representations in Fourier space and their eigenstructure, nonlinear mappings, and the observability matrix. We present a result for Mamba that is similar to a Hautus-type condition, but instead employs an argument using a Vandermonde matrix instead of eigenvectors. Our final result is a shared-parameter construction of the Mamba system, which is computationally efficient in high exponentiation. We develop a training algorithm with this coupling, showing it satisfies a Robbins-Monro condition under certain orthogonality, while a more classical training procedure fails to satisfy a contraction with high Lipschitz constant.
comment: Corrections and improvements to objective functions and new experiments
A Systematic Literature Review on Safety of the Intended Functionality for Automated Driving Systems
In the automobile industry, ensuring the safety of automated vehicles equipped with the Automated Driving System (ADS) is becoming a significant focus due to the increasing development and deployment of automated driving. Automated driving depends on sensing both the external and internal environments of a vehicle, utilizing perception sensors and algorithms, and Electrical/Electronic (E/E) systems for situational awareness and response. ISO 21448 is the standard for Safety of the Intended Functionality (SOTIF) that aims to ensure that the ADS operate safely within their intended functionality. SOTIF focuses on preventing or mitigating potential hazards that may arise from the limitations or failures of the ADS, including hazards due to insufficiencies of specification, or performance insufficiencies, as well as foreseeable misuse of the intended functionality. However, the challenge lies in ensuring the safety of vehicles despite the limited availability of extensive and systematic literature on SOTIF. To address this challenge, a Systematic Literature Review (SLR) on SOTIF for the ADS is performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The objective is to methodically gather and analyze the existing literature on SOTIF. The major contributions of this paper are: (i) presenting a summary of the literature by synthesizing and organizing the collective findings, methodologies, and insights into distinct thematic groups, and (ii) summarizing and categorizing the acknowledged limitations based on data extracted from an SLR of 51 research papers published between 2018 and 2023. Furthermore, research gaps are determined, and future research directions are proposed.
comment: Scheduled to be published in SAE journal as technical paper as a part of non-technical event and will be available as open access in 2025
Conformal Predictive Programming for Chance Constrained Optimization
We propose conformal predictive programming (CPP), a framework to solve chance constrained optimization problems, i.e., optimization problems with constraints that are functions of random variables. CPP utilizes samples from these random variables along with the quantile lemma - central to conformal prediction - to transform the chance constrained optimization problem into a deterministic problem with a quantile reformulation. CPP inherits a priori guarantees on constraint satisfaction from existing sample average approximation approaches for a class of chance constrained optimization problems, and it provides a posteriori guarantees that are of conditional and marginal nature otherwise. The strength of CPP is that it can easily support different variants of conformal prediction which have been (or will be) proposed within the conformal prediction community. To illustrate this, we present robust CPP to deal with distribution shifts in the random variables and Mondrian CPP to deal with class conditional chance constraints. To enable tractable solutions to the quantile reformulation, we present a mixed integer programming method (CPP-MIP) encoding, a bilevel optimization strategy (CPP-Bilevel), and a sampling-and-discarding optimization strategy (CPP-Discarding). We also extend CPP to deal with joint chance constrained optimization (JCCO). In a series of case studies, we show the validity of the aforementioned approaches, empirically compare CPP-MIP, CPP-Bilevel, as well as CPP-Discarding, and illustrate the advantage of CPP as compared to scenario approach.
Defense Strategies for Autonomous Multi-agent Systems: Ensuring Safety and Resilience Under Exponentially Unbounded FDI Attacks
False data injection attacks pose a significant threat to autonomous multi-agent systems (MASs). Existing attack-resilient control strategies generally have strict assumptions on the attack signals and overlook safety constraints, such as collision avoidance. In practical applications, leader agents equipped with advanced sensors or weaponry span a safe region to guide heterogeneous follower agents, ensuring coordinated operations while addressing collision avoidance to prevent financial losses and mission failures. This letter addresses these gaps by introducing and solving the safety-aware and attack-resilient (SAAR) control problem under exponentially unbounded false data injection (EU-FDI) attacks. Specifically, a novel attack-resilient observer layer (OL) is first designed to defend against EU-FDI attacks on the OL. Then, an attack-resilient compensational signal is designed to mitigate the adverse effects caused by the EU-FDI attack on control input layer (CIL). Finally, a SAAR controller is designed by solving a quadratic programming (QP) problem integrating control barrier function (CBF) certified collision-free safety constraints. Rigorous Lyapunov-based stability analysis certifies the SAAR controller's effectiveness in ensuring both safety and resilience. This study also pioneers a three-dimensional (3D) simulation of the SAAR containment control problem for heterogeneous MASs, demonstrating its applicability in realistic multi-agent scenarios.
Reward-Based Collision-Free Algorithm for Trajectory Planning of Autonomous Robots
This paper proposes a novel mission planning algorithm for autonomous robots that selects an optimal waypoint sequence from a predefined set to maximize total reward while satisfying obstacle avoidance, state, input, derivative, mission time, and distance constraints. The formulation extends the prize-collecting traveling salesman problem. A tailored genetic algorithm evolves candidate solutions using a fitness function, crossover, and mutation, with constraint enforcement via a penalty method. Differential flatness and clothoid curves are employed to penalize infeasible trajectories efficiently, while the Euler spiral method ensures curvature-continuous trajectories with bounded curvature, enhancing dynamic feasibility and mitigating oscillations typical of minimum-jerk and snap parameterizations. Due to the discrete variable length optimization space, crossover is performed using a dynamic time-warping-based method and extended convex combination with projection. The algorithm's performance is validated through simulations and experiments with a ground vehicle, quadrotor, and quadruped, supported by benchmarking and time-complexity analysis.
Linear Model of Aggregated Homogeneous Energy Storage Elements with Realizable Dispatch Guarantees
To optimize the dispatch of batteries, a model is required that can predict the state of charge (SOC) trajectory for a chosen open-loop power schedule to ensure admissibility (i.e., that schedule can be realized). However, battery dispatch optimization is inherently challenging since batteries cannot simultaneously charge and discharge, which begets a non-convex complementarity constraint. In this paper, we develop a novel composition of energy storage elements that can charge or discharge independently and provide a sufficient linear energy storage model of the composite battery. This permits convex optimization of the composite battery SOC trajectory while guaranteeing admissibility of the resulting (aggregated) power schedule and its disaggregation to the individual energy storage elements.
NPGA: A Unified Algorithmic Framework for Decentralized Constraint-Coupled Optimization
This work focuses on a class of general decentralized constraint-coupled optimization problems. We propose a novel nested primal-dual gradient algorithm (NPGA), which can achieve linear convergence under the weakest known condition, and its theoretical convergence rate surpasses all known results. More importantly, NPGA serves not only as an algorithm but also as a unified algorithmic framework, encompassing various existing algorithms as special cases. By designing different network matrices, we can derive numerous versions of NPGA and analyze their convergences by leveraging the convergence results of NPGA conveniently, thereby enabling the design of more efficient algorithms. Finally, we conduct numerical experiments to compare the convergence rates of NPGA and existing algorithms, providing empirical evidence for the superior performance of NPGA.
Systems and Control (EESS)
Structured Estimators: A New Perspective on Information Freshness
In recent literature, when modeling for information freshness in remote estimation settings, estimators have been mainly restricted to the class of martingale estimators, meaning the remote estimate at any time is equal to the most recently received update. This is mainly due to its simplicity and ease of analysis. However, these martingale estimators are far from optimal in some cases, especially in pull-based update systems. For such systems, maximum aposteriori probability (MAP) estimators are optimum, but can be challenging to analyze. Here, we introduce a new class of estimators, called structured estimators, which retain useful characteristics from a MAP estimate while still being analytically tractable. Our proposed estimators move seamlessly from a martingale estimator to a MAP estimator.
Online Phase Estimation of Human Oscillatory Motions using Deep Learning
Accurately estimating the phase of oscillatory systems is essential for analyzing cyclic activities such as repetitive gestures in human motion. In this work we introduce a learning-based approach for online phase estimation in three-dimensional motion trajectories, using a Long Short- Term Memory (LSTM) network. A calibration procedure is applied to standardize trajectory position and orientation, ensuring invariance to spatial variations. The proposed model is evaluated on motion capture data and further tested in a dynamical system, where the estimated phase is used as input to a reinforcement learning (RL)-based control to assess its impact on the synchronization of a network of Kuramoto oscillators.
Wise Goose Chase: A Predictive Path Planning Algorithm for Dynamic Rebalancing in Ride-Hailing Systems
Traditional rebalancing methods in ride-hailing systems direct idle drivers to fixed destinations, overlooking the fact that ride allocations frequently occur while cruising. This destination-centric view fails to exploit the path-dependent nature of modern platforms, where real-time matching depends on the entire trajectory rather than a static endpoint. We propose the Wise Goose Chase (WGC) algorithm, an event-triggered, driver-specific path planning framework that anticipates future matching opportunities by forecasting spatio-temporal supply and demand dynamics. WGC uses a system of Retarded Functional Differential Equations (RFDEs) to model the evolution of idle driver density and passenger queues at the road-segment level, incorporating both en-route matching and competition among drivers. Upon request, WGC computes personalized cruising paths that minimize each driver's expected time to allocation. Monte Carlo simulations on synthetic urban networks show that WGC consistently outperforms baseline strategies, highlighting the advantage of predictive, context-aware rebalancing in dynamic mobility systems.
Maximal Compatibility Matching for Preference-Aware Ride-Hailing Systems
This paper presents the Maximal Compatibility Matching (MCM) framework, a novel assignment strategy for ride-hailing systems that explicitly incorporates passenger comfort into the matching process. Traditional assignment methods prioritize spatial efficiency, but often overlook behavioral alignment between passengers and drivers, which can significantly impact user satisfaction. MCM addresses this gap by learning personalized passenger comfort zones using gradient-boosted decision tree classifiers trained on labeled ride data, and by modeling driver behavior through empirical operating profiles constructed from time-series driving features. Compatibility between a passenger and a driver is computed as the closed-form volume of intersection between their respective feature-space regions. These compatibility scores are integrated into a utility-based matching algorithm that balances comfort and proximity through a tunable trade-off parameter. We validate the framework using a Unity-based driving simulator with real-time passenger feedback, demonstrating that MCM enables more personalized and socially acceptable matchings while maintaining high levels of operational performance.
LiDAR-Inertial SLAM-Based Navigation and Safety-Oriented AI-Driven Control System for Skid-Steer Robots
Integrating artificial intelligence (AI) and stochastic technologies into the mobile robot navigation and control (MRNC) framework while adhering to rigorous safety standards presents significant challenges. To address these challenges, this paper proposes a comprehensively integrated MRNC framework for skid-steer wheeled mobile robots (SSWMRs), in which all components are actively engaged in real-time execution. The framework comprises: 1) a LiDAR-inertial simultaneous localization and mapping (SLAM) algorithm for estimating the current pose of the robot within the built map; 2) an effective path-following control system for generating desired linear and angular velocity commands based on the current pose and the desired pose; 3) inverse kinematics for transferring linear and angular velocity commands into left and right side velocity commands; and 4) a robust AI-driven (RAID) control system incorporating a radial basis function network (RBFN) with a new adaptive algorithm to enforce in-wheel actuation systems to track each side motion commands. To further meet safety requirements, the proposed RAID control within the MRNC framework of the SSWMR constrains AI-generated tracking performance within predefined overshoot and steady-state error limits, while ensuring robustness and system stability by compensating for modeling errors, unknown RBF weights, and external forces. Experimental results verify the proposed MRNC framework performance for a 4,836 kg SSWMR operating on soft terrain.
comment: This paper has been submitted in the IEEE CDC 2025 for potential presentation
Learning and Online Replication of Grasp Forces from Electromyography Signals for Prosthetic Finger Control ICRA 2025
Partial hand amputations significantly affect the physical and psychosocial well-being of individuals, yet intuitive control of externally powered prostheses remains an open challenge. To address this gap, we developed a force-controlled prosthetic finger activated by electromyography (EMG) signals. The prototype, constructed around a wrist brace, functions as a supernumerary finger placed near the index, allowing for early-stage evaluation on unimpaired subjects. A neural network-based model was then implemented to estimate fingertip forces from EMG inputs, allowing for online adjustment of the prosthetic finger grip strength. The force estimation model was validated through experiments with ten participants, demonstrating its effectiveness in predicting forces. Additionally, online trials with four users wearing the prosthesis exhibited precise control over the device. Our findings highlight the potential of using EMG-based force estimation to enhance the functionality of prosthetic fingers.
comment: 7 pages, 6 figures, to be presented at ICRA 2025
Data-Driven Energy Modeling of Industrial IoT Systems: A Benchmarking Approach
The widespread adoption of IoT has driven the development of cyber-physical systems (CPS) in industrial environments, leveraging Industrial IoTs (IIoTs) to automate manufacturing processes and enhance productivity. The transition to autonomous systems introduces significant operational costs, particularly in terms of energy consumption. Accurate modeling and prediction of IIoT energy requirements are critical, but traditional physics- and engineering-based approaches often fall short in addressing these challenges comprehensively. In this paper, we propose a novel methodology for benchmarking and analyzing IIoT devices and applications to uncover insights into their power demands, energy consumption, and performance. To demonstrate this methodology, we develop a comprehensive framework and apply it to study an industrial CPS comprising an educational robotic arm, a conveyor belt, a smart camera, and a compute node. By creating micro-benchmarks and an end-to-end application within this framework, we create an extensive performance and power consumption dataset, which we use to train and analyze ML models for predicting energy usage from features of the application and the CPS system. The proposed methodology and framework provide valuable insights into the energy dynamics of industrial CPS, offering practical implications for researchers and practitioners aiming to enhance the efficiency and sustainability of IIoT-driven automation.
ZeloS -- A Research Platform for Early-Stage Validation of Research Findings Related to Automated Driving
This paper presents ZeloS, a research platform designed and built for practical validation of automated driving methods in an early stage of research. We overview ZeloS' hardware setup and automation architecture and focus on motion planning and control. ZeloS weighs 69 kg, measures a length of 117 cm, and is equipped with all-wheel steering, all-wheel drive, and various onboard sensors for localization. The hardware setup and the automation architecture of ZeloS are designed and built with a focus on modularity and the goal of being simple yet effective. The modular design allows the modification of individual automation modules without the need for extensive onboarding into the automation architecture. As such, this design supports ZeloS in being a versatile research platform for validating various automated driving methods. The motion planning component and control of ZeloS feature optimization-based methods that allow for explicitly considering constraints. We demonstrate the hardware and automation setup by presenting experimental data.
ReeM: Ensemble Building Thermodynamics Model for Efficient HVAC Control via Hierarchical Reinforcement Learning
The building thermodynamics model, which predicts real-time indoor temperature changes under potential HVAC (Heating, Ventilation, and Air Conditioning) control operations, is crucial for optimizing HVAC control in buildings. While pioneering studies have attempted to develop such models for various building environments, these models often require extensive data collection periods and rely heavily on expert knowledge, making the modeling process inefficient and limiting the reusability of the models. This paper explores a model ensemble perspective that utilizes existing developed models as base models to serve a target building environment, thereby providing accurate predictions while reducing the associated efforts. Given that building data streams are non-stationary and the number of base models may increase, we propose a Hierarchical Reinforcement Learning (HRL) approach to dynamically select and weight the base models. Our approach employs a two-tiered decision-making process: the high-level focuses on model selection, while the low-level determines the weights of the selected models. We thoroughly evaluate the proposed approach through offline experiments and an on-site case study, and the experimental results demonstrate the effectiveness of our method.
Impact of Transceiver Selection on Synchronization Accuracy in White Rabbit Networks
Achieving optimal synchronization accuracy between two White Rabbit devices hinges on the proper selection of transceivers, which act as electro-optical converters connecting WR devices to the optical network infrastructure. The correct choice of transceivers can significantly improve resilience to changes in the time offset between WR devices due to temperature fluctuations in the connecting optical fiber. To compare the performance of BiDi WDM and DWDM transceivers, an experimental setup was established under laboratory conditions to simulate a real optical network used for distributing precise time and frequency between two remote locations. The optical connection was emulated by integrating a 20 km G.652.D optical fiber into a climatic chamber, which provided variable environmental conditions similar to those experienced in real applications. The study compared BiDi WDM 1310/1550 nm transceivers with DWDM Ch33/Ch34 transceivers. Results showed that DWDM transceivers exhibited nearly thirteen times less sensitivity to temperature-induced changes in the optical connection, leading to a smaller time offset. Therefore, for achieving the highest accuracy in synchronizing WR devices in practical applications, DWDM transceiver technology is essential.
comment: 5 pages, 4 figures, conference paper
Quadrupedal Spine Control Strategies: Exploring Correlations Between System Dynamic Responses and Human Perspectives
Unlike their biological cousins, the majority of existing quadrupedal robots are constructed with rigid chassis. This results in motion that is either beetle-like or distinctly robotic, lacking the natural fluidity characteristic of mammalian movements. Existing literature on quadrupedal robots with spinal configurations primarily focuses on energy efficiency and does not consider the effects in human-robot interaction scenarios. Our contributions include an initial investigation into various trajectory generation strategies for a quadrupedal robot with a four degree of freedom spine, and an analysis on the effect that such methods have on human perception of gait naturalness compared to a fixed spine baseline. The strategies were evaluated using videos of walking, trotting and turning simulations. Among the four different strategies developed, the optimised time varying and the foot-tracking strategies were perceived to be more natural than the baseline in a randomised trial with 50 participants. Although none of the strategies demonstrated any energy efficiency improvements over the no-spine baseline, some showed greater footfall consistency at higher speeds. Given the greater likeability drawn from the more natural locomotion patterns, this type of robot displays potential for applications in social robot scenarios such as elderly care, where energy efficiency is not a primary concern.
comment: 27 pages, 13 figures
A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints
We present a real-time safety filter for motion planning, such as learning-based methods, using Control Barrier Functions (CBFs), which provides formal guarantees for collision avoidance with road boundaries. A key feature of our approach is its ability to directly incorporate road geometries of arbitrary shape without resorting to conservative overapproximations. We formulate the safety filter as a constrained optimization problem in the form of a Quadratic Program (QP). It achieves safety by making minimal, necessary adjustments to the control actions issued by the nominal motion planner. We validate our safety filter through extensive numerical experiments across a variety of traffic scenarios featuring complex roads. The results confirm its reliable safety and high computational efficiency (execution frequency up to 40 Hz). Code & Video Demo: github.com/bassamlab/SigmaRL
Riemannian Direct Trajectory Optimization of Rigid Bodies on Matrix Lie Groups
Designing dynamically feasible trajectories for rigid bodies is a fundamental problem in robotics. Although direct trajectory optimization is widely applied to solve this problem, inappropriate parameterizations of rigid body dynamics often result in slow convergence and violations of the intrinsic topological structure of the rotation group. This paper introduces a Riemannian optimization framework for direct trajectory optimization of rigid bodies. We first use the Lie Group Variational Integrator to formulate the discrete rigid body dynamics on matrix Lie groups. We then derive the closed-form first- and second-order Riemannian derivatives of the dynamics. Finally, this work applies a line-search Riemannian Interior Point Method (RIPM) to perform trajectory optimization with general nonlinear constraints. As the optimization is performed on matrix Lie groups, it is correct-by-construction to respect the topological structure of the rotation group and be free of singularities. The paper demonstrates that both the derivative evaluations and Newton steps required to solve the RIPM exhibit linear complexity with respect to the planning horizon and system degrees of freedom. Simulation results illustrate that the proposed method is faster than conventional methods by an order of magnitude in challenging robotics tasks.
comment: Accepted to Robotics: Science and Systems (RSS) 2025
Robustly Invertible Nonlinear Dynamics and the BiLipREN: Contracting Neural Models with Contracting Inverses
We study the invertibility of nonlinear dynamical systems from the perspective of contraction and incremental stability analysis and propose a new invertible recurrent neural model: the BiLipREN. In particular, we consider a nonlinear state space model to be robustly invertible if an inverse exists with a state space realisation, and both the forward model and its inverse are contracting, i.e. incrementally exponentially stable, and Lipschitz, i.e. have bounded incremental gain. This property of bi-Lipschitzness implies both robustness in the sense of sensitivity to input perturbations, as well as robust distinguishability of different inputs from their corresponding outputs, i.e. the inverse model robustly reconstructs the input sequence despite small perturbations to the initial conditions and measured output. Building on this foundation, we propose a parameterization of neural dynamic models: bi-Lipschitz recurrent equilibrium networks (biLipREN), which are robustly invertible by construction. Moreover, biLipRENs can be composed with orthogonal linear systems to construct more general bi-Lipschitz dynamic models, e.g., a nonlinear analogue of minimum-phase/all-pass (inner/outer) factorization. We illustrate the utility of our proposed approach with numerical examples.
$\mathcal{H}_2$-optimal model reduction of linear quadratic-output systems by multivariate rational interpolation
This paper addresses the $\mathcal{H}_2$-optimal approximation of linear dynamical systems with quadratic-output functions, also known as linear quadratic-output systems. Our major contributions are threefold. First, we derive interpolation-based first-order optimality conditions for the linear quadratic-output $\mathcal{H}_2$ minimization problem. These conditions correspond to the mixed-multipoint tangential interpolation of the full-order linear- and quadratic-output transfer functions, and generalize the Meier-Luenberger optimality framework for the $\mathcal{H}_2$-optimal model reduction of linear time-invariant systems. Second, given the interpolation data, we show how to enforce these mixed-multipoint tangential interpolation conditions explicitly by Petrov-Galerkin projection of the full-order model matrices. Third, to find the optimal interpolation data, we build on this projection framework and propose a generalization of the iterative rational Krylov algorithm for the $\mathcal{H}_2$-optimal model reduction of linear quadratic-output systems, called LQO-IRKA. Upon convergence, LQO-IRKA produces a reduced linear quadratic-output system that satisfies the interpolatory optimality conditions. The method only requires solving shifted linear systems and matrix-vector products, thus making it suitable for large-scale problems. Numerical examples are included to illustrate the effectiveness of the proposed method.
Computing in Integrated Terrestrial and Non-Terrestrial Networks: A Comprehensive Survey
The rapid growth of Internet-of-things (IoT) devices, smart vehicles, and other connected objects is driving demand for ubiquitous connectivity and intensive computing capacity. 5G and upcoming 6G networks are crucial to meeting these demands and the fast-evolving services and applications. However, traditional terrestrial networks face limitations in coverage and capacity. Integrated Terrestrial and Non-Terrestrial Networks (ITNTN) are emerging to address these challenges. In essence, ITNTN combines ground-based infrastructure with aerial, space, and water surface networks to provide seamless connectivity and computing resources anytime, anywhere. Given the stringent quality-of-service (QoS) of future services, edge computing will be an inseparable component of ITNTN. Consequently, we dive in this survey into current efforts of integrating cloud/fog/edge computing into ITNTN layers to facilitate stringent QoS services and address the data processing needs of modern applications. Since there have been only limited and partial efforts in integrating computing functionalities within ITNTN, we aim to extend the discussion to the full integration of computing and identifying the challenges and future research directions to achieve it.
comment: 20 pages, 7 figures, 9 tables. Submitted to IEEE Access
A Practical Approach Towards Inertia Estimation Using Ambient Synchrophasor Data
Real-time tracking of inertia is important because it reflects the power system's ability to withstand contingencies and maintain frequency security. This paper proposes a practical approach to estimate inertia using ambient phasor measurement unit (PMU) data and a partitioned form of the swing equation. The approach accounts for (bounded) uncertainties in network parameters and PMU measurements, enabling precise estimation of inertia and damping constants, as well as mechanical power inputs. Instead of assuming constant mechanical power input throughout, the approach leverages knowledge of power system operations to determine intervals when it is actually constant to maintain estimation consistency. Simulation results on the IEEE 14-bus system and IEEE 39 bus system integrated with renewable energy sources affirm the method's accuracy and applicability.
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety and actively regulate environmental coupling, allowing, for instance, safe object manipulation under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
In vivo validation of Wireless Power Transfer System for Magnetically Controlled Robotic Capsule Endoscopy
This paper presents the in vivo validation of an inductive wireless power transfer (WPT) system integrated for the first time into a magnetically controlled robotic capsule endoscopy platform. The proposed system enables continuous power delivery to the capsule without the need for onboard batteries, thus extending operational time and reducing size constraints. The WPT system operates through a resonant inductive coupling mechanism, based on a transmitting coil mounted on the end effector of a robotic arm that also houses an external permanent magnet and a localization coil for precise capsule manipulation. To ensure robust and stable power transmission in the presence of coil misalignment and rotation, a 3D receiving coil is integrated within the capsule. Additionally, a closed-loop adaptive control system, based on load-shift keying (LSK) modulation, dynamically adjusts the transmitted power to optimize efficiency while maintaining compliance with specific absorption rate (SAR) safety limits. The system has been extensively characterized in laboratory settings and validated through in vivo experiments using a porcine model, demonstrating reliable power transfer and effective robotic navigation in realistic gastrointestinal conditions: the average received power was 110 mW at a distance of 9 cm between the coils, with variable capsule rotation angles. The results confirm the feasibility of the proposed WPT approach for autonomous, battery-free robotic capsule endoscopy, paving the way for enhanced diagnostic in gastrointestinal medicine.
comment: 9 pages, 9 figures, regular paper
Data-Driven Reachability Analysis for Piecewise Affine Systems
Hybrid systems play a crucial role in modeling real-world applications where discrete and continuous dynamics interact, including autonomous vehicles, power systems, and traffic networks. Safety verification for these systems requires determining whether system states can enter unsafe regions under given initial conditions and uncertainties, a question directly addressed by reachability analysis. However, hybrid systems present unique difficulties because their state space is divided into multiple regions with distinct dynamic models, causing traditional data-driven methods to produce inadequate over-approximations of reachable sets at region boundaries where dynamics change abruptly. This paper introduces a novel approach using hybrid zonotopes for data-driven reachability analysis of piecewise affine systems. Our method addresses the boundary transition problem by developing computational algorithms that calculate the family of set models guaranteed to contain the true system trajectories. Additionally, we extend and evaluate three methods for set-based estimation that account for input-output data with measurement noise.
comment: 8 pages, 6 figures
Observability conditions for neural state-space models with eigenvalues and their roots of unity
We operate through the lens of ordinary differential equations and control theory to study the concept of observability in the context of neural state-space models and the Mamba architecture. We develop strategies to enforce observability, which are tailored to a learning context, specifically where the hidden states are learnable at initial time, in conjunction to over its continuum, and high-dimensional. We also highlight our methods emphasize eigenvalues, roots of unity, or both. Our methods effectuate computational efficiency when enforcing observability, sometimes at great scale. We formulate observability conditions in machine learning based on classical control theory and discuss their computational complexity. Our nontrivial results are fivefold. We discuss observability through the use of permutations in neural applications with learnable matrices without high precision. We present two results built upon the Fourier transform that effect observability with high probability up to the randomness in the learning. These results are worked with the interplay of representations in Fourier space and their eigenstructure, nonlinear mappings, and the observability matrix. We present a result for Mamba that is similar to a Hautus-type condition, but instead employs an argument using a Vandermonde matrix instead of eigenvectors. Our final result is a shared-parameter construction of the Mamba system, which is computationally efficient in high exponentiation. We develop a training algorithm with this coupling, showing it satisfies a Robbins-Monro condition under certain orthogonality, while a more classical training procedure fails to satisfy a contraction with high Lipschitz constant.
comment: Corrections and improvements to objective functions and new experiments
A Systematic Literature Review on Safety of the Intended Functionality for Automated Driving Systems
In the automobile industry, ensuring the safety of automated vehicles equipped with the Automated Driving System (ADS) is becoming a significant focus due to the increasing development and deployment of automated driving. Automated driving depends on sensing both the external and internal environments of a vehicle, utilizing perception sensors and algorithms, and Electrical/Electronic (E/E) systems for situational awareness and response. ISO 21448 is the standard for Safety of the Intended Functionality (SOTIF) that aims to ensure that the ADS operate safely within their intended functionality. SOTIF focuses on preventing or mitigating potential hazards that may arise from the limitations or failures of the ADS, including hazards due to insufficiencies of specification, or performance insufficiencies, as well as foreseeable misuse of the intended functionality. However, the challenge lies in ensuring the safety of vehicles despite the limited availability of extensive and systematic literature on SOTIF. To address this challenge, a Systematic Literature Review (SLR) on SOTIF for the ADS is performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The objective is to methodically gather and analyze the existing literature on SOTIF. The major contributions of this paper are: (i) presenting a summary of the literature by synthesizing and organizing the collective findings, methodologies, and insights into distinct thematic groups, and (ii) summarizing and categorizing the acknowledged limitations based on data extracted from an SLR of 51 research papers published between 2018 and 2023. Furthermore, research gaps are determined, and future research directions are proposed.
comment: Scheduled to be published in SAE journal as technical paper as a part of non-technical event and will be available as open access in 2025
Conformal Predictive Programming for Chance Constrained Optimization
We propose conformal predictive programming (CPP), a framework to solve chance constrained optimization problems, i.e., optimization problems with constraints that are functions of random variables. CPP utilizes samples from these random variables along with the quantile lemma - central to conformal prediction - to transform the chance constrained optimization problem into a deterministic problem with a quantile reformulation. CPP inherits a priori guarantees on constraint satisfaction from existing sample average approximation approaches for a class of chance constrained optimization problems, and it provides a posteriori guarantees that are of conditional and marginal nature otherwise. The strength of CPP is that it can easily support different variants of conformal prediction which have been (or will be) proposed within the conformal prediction community. To illustrate this, we present robust CPP to deal with distribution shifts in the random variables and Mondrian CPP to deal with class conditional chance constraints. To enable tractable solutions to the quantile reformulation, we present a mixed integer programming method (CPP-MIP) encoding, a bilevel optimization strategy (CPP-Bilevel), and a sampling-and-discarding optimization strategy (CPP-Discarding). We also extend CPP to deal with joint chance constrained optimization (JCCO). In a series of case studies, we show the validity of the aforementioned approaches, empirically compare CPP-MIP, CPP-Bilevel, as well as CPP-Discarding, and illustrate the advantage of CPP as compared to scenario approach.
Defense Strategies for Autonomous Multi-agent Systems: Ensuring Safety and Resilience Under Exponentially Unbounded FDI Attacks
False data injection attacks pose a significant threat to autonomous multi-agent systems (MASs). Existing attack-resilient control strategies generally have strict assumptions on the attack signals and overlook safety constraints, such as collision avoidance. In practical applications, leader agents equipped with advanced sensors or weaponry span a safe region to guide heterogeneous follower agents, ensuring coordinated operations while addressing collision avoidance to prevent financial losses and mission failures. This letter addresses these gaps by introducing and solving the safety-aware and attack-resilient (SAAR) control problem under exponentially unbounded false data injection (EU-FDI) attacks. Specifically, a novel attack-resilient observer layer (OL) is first designed to defend against EU-FDI attacks on the OL. Then, an attack-resilient compensational signal is designed to mitigate the adverse effects caused by the EU-FDI attack on control input layer (CIL). Finally, a SAAR controller is designed by solving a quadratic programming (QP) problem integrating control barrier function (CBF) certified collision-free safety constraints. Rigorous Lyapunov-based stability analysis certifies the SAAR controller's effectiveness in ensuring both safety and resilience. This study also pioneers a three-dimensional (3D) simulation of the SAAR containment control problem for heterogeneous MASs, demonstrating its applicability in realistic multi-agent scenarios.
Reward-Based Collision-Free Algorithm for Trajectory Planning of Autonomous Robots
This paper proposes a novel mission planning algorithm for autonomous robots that selects an optimal waypoint sequence from a predefined set to maximize total reward while satisfying obstacle avoidance, state, input, derivative, mission time, and distance constraints. The formulation extends the prize-collecting traveling salesman problem. A tailored genetic algorithm evolves candidate solutions using a fitness function, crossover, and mutation, with constraint enforcement via a penalty method. Differential flatness and clothoid curves are employed to penalize infeasible trajectories efficiently, while the Euler spiral method ensures curvature-continuous trajectories with bounded curvature, enhancing dynamic feasibility and mitigating oscillations typical of minimum-jerk and snap parameterizations. Due to the discrete variable length optimization space, crossover is performed using a dynamic time-warping-based method and extended convex combination with projection. The algorithm's performance is validated through simulations and experiments with a ground vehicle, quadrotor, and quadruped, supported by benchmarking and time-complexity analysis.
Linear Model of Aggregated Homogeneous Energy Storage Elements with Realizable Dispatch Guarantees
To optimize the dispatch of batteries, a model is required that can predict the state of charge (SOC) trajectory for a chosen open-loop power schedule to ensure admissibility (i.e., that schedule can be realized). However, battery dispatch optimization is inherently challenging since batteries cannot simultaneously charge and discharge, which begets a non-convex complementarity constraint. In this paper, we develop a novel composition of energy storage elements that can charge or discharge independently and provide a sufficient linear energy storage model of the composite battery. This permits convex optimization of the composite battery SOC trajectory while guaranteeing admissibility of the resulting (aggregated) power schedule and its disaggregation to the individual energy storage elements.
NPGA: A Unified Algorithmic Framework for Decentralized Constraint-Coupled Optimization
This work focuses on a class of general decentralized constraint-coupled optimization problems. We propose a novel nested primal-dual gradient algorithm (NPGA), which can achieve linear convergence under the weakest known condition, and its theoretical convergence rate surpasses all known results. More importantly, NPGA serves not only as an algorithm but also as a unified algorithmic framework, encompassing various existing algorithms as special cases. By designing different network matrices, we can derive numerous versions of NPGA and analyze their convergences by leveraging the convergence results of NPGA conveniently, thereby enabling the design of more efficient algorithms. Finally, we conduct numerical experiments to compare the convergence rates of NPGA and existing algorithms, providing empirical evidence for the superior performance of NPGA.